DSpace 7 Manual
DSpace 7 Manual
x Documentation
1
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 Release Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Functional Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.3 Technology Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2. Installing DSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1 7.0-7.1 Frontend Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3. Upgrading DSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.1 Migrating DSpace to a new server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4. Using DSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.1 Authentication and Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.1.1 Authentication Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.2 Bulk Access Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.1.3 Embargo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.1.3.1 Pre-3.0 Embargo Lifter Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.1.4 Managing User Accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.1.4.1 Email Subscriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.1.5 Request a Copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.2 CAPTCHA Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.3 Configurable Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4 Curation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.4.1 Bundled Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.4.1.1 Bitstream Format Profiler Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.4.1.2 Link Checker Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.4.1.3 MetadataWebService Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.4.1.4 MicrosoftTranslator Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4.4.1.5 NoOp Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.4.1.6 Required Metadata Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.4.1.7 Virus Scan Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.5 Exporting Content and Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.5.1 Linked (Open) Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
4.5.2 SWORDv1 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.5.3 Exchanging Content Between Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.5.4 OAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.5.4.1 OAI-PMH Data Provider 2.0 (Internals) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.5.4.2 OAI 2.0 Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.5.5 OpenAIRE4 Guidelines Compliancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.5.6 Signposting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.6 Ingesting Content and Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4.6.1 Ingesting HTML Archives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.6.2 SWORDv2 Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
4.6.3 SWORDv1 Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.6.4 Exporting and Importing Community and Collection Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
4.6.5 Importing Items via basic bibliographic formats (Endnote, BibTex, RIS, CSV, etc) and online services (arXiv, PubMed,
CrossRef, CiNii, etc) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4.6.6 Registering Bitstreams via Simple Archive Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
4.6.7 Importing and Exporting Items via Simple Archive Format (SAF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
4.6.8 Importing and Exporting Content via Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
4.6.9 Configurable Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
4.6.10 Submission User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
4.6.10.1 Live Import from external sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
4.6.10.2 Simple HTML Fragment Markup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
4.6.10.3 Supervision Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
4.7 Items and Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
4.7.1 Authority Control of Metadata Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
4.7.1.1 ORCID Authority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
4.7.2 Batch Metadata Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
4.7.2.1 Batch Metadata Editing Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
4.7.3 DOI Digital Object Identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
4.7.4 Item Level Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
4.7.5 Mapping/Linking Items to multiple Collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
4.7.6 Metadata Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
4.7.7 Moving Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
4.7.8 PDF Citation Cover Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
4.7.9 Updating Items via Simple Archive Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
4.8 Managing Community Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
4.9 ORCID Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
4.10 Researcher Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
4.11 Statistics and Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
4.11.1 SOLR Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
4.11.1.1 SOLR Statistics Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
4.11.1.1.1 Testing Solr Shards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
4.11.2 DSpace Google Analytics Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
4.11.3 Exchange usage statistics with IRUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
4.12 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
4.12.1 User Interface Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
4.12.2 User Interface Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
4.12.3 User Interface Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
4.12.4 Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
2
4.12.5 Browse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
4.12.6 Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
4.12.7 Contextual Help Tooltips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
4.12.8 IIIF Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
4.12.9 Multilingual Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5. System Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
5.1 AIP Backup and Restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
5.1.1 DSpace AIP Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
5.2 Ant targets and options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
5.3 Command Line Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
5.3.1 Database Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
5.3.2 Executing streams of commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
5.4 Handle.Net Registry Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
5.5 Logical Item Filtering and DOI Filtered Provider for DSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
5.6 Mediafilters for Transforming DSpace Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
5.6.1 ImageMagick Media Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
5.7 Performance Tuning DSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
5.8 Ping or Healthcheck endpoints for confirming DSpace services are functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
5.9 Scheduled Tasks via Cron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
5.10 Search Engine Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
5.10.1 Google Scholar Metadata Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
5.11 Troubleshooting Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
5.12 Validating CheckSums of Bitstreams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
6. DSpace Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
6.1 User Interface Design Principles & Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
6.2 REST API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
6.3 REST API v6 (deprecated) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
6.3.1 REST Based Quality Control Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
6.3.1.1 REST Reports - Collection Report Screenshots with Annotated API Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
6.3.1.2 REST Reports - Metadata Query Screenshots with Annotated API Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
6.3.1.3 REST Reports - Summary of API Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
6.4 Advanced Customisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
6.4.1 DSpace Service Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
6.5 Curation Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
6.5.1 Curation tasks in Jython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
6.6 Development Tools Provided by DSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
6.7 Services to support Alternative Identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
6.8 Batch Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
6.9 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
7. DSpace Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
7.1 Configuration Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
7.2 DSpace Item State Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
7.3 Directories and Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
7.4 Metadata and Bitstream Format Registries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
7.5 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
7.5.1 Application Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
7.5.2 Business Logic Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
7.5.3 DSpace Services Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
7.5.4 Storage Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
7.6 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
7.6.1 Changes in 7.x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
7.6.2 Changes in Older Releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
8. Learning DSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
8.1 Community and Collection management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
8.1.1 Collection Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629
8.1.1.1 Create Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
8.1.1.2 Delete Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
8.1.1.3 Edit Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
8.1.1.4 Export Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
8.1.2 Community Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
8.1.2.1 Create a Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
8.1.2.2 Delete Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
8.1.2.3 Edit Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
8.2 Content (Item) management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
8.2.1 Add item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
8.2.2 Delete item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 690
8.2.3 Edit Item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692
8.2.3.1 Authorizations (Manage access to an item) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
8.2.3.2 Collection Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
8.2.3.3 Edit Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
8.2.3.4 Edit Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
8.2.3.5 Edit Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
8.2.3.6 Make an Item Private . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
8.2.3.7 Move an Item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
8.2.3.8 Reinstate an item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
8.2.3.9 Versioned Item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
8.2.3.10 Withdraw an item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
8.2.4 Embargo an item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
8.2.5 Lease an item . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763
3
8.3 DSpace 7 Demo Quick Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
8.4 Management sidebar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774
8.5 Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
8.6 Registry management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778
8.6.1 Metadata Registry Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779
8.7 Request-a-copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792
8.8 Search - Advanced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
8.9 User management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
8.9.1 Add or Manage an E-Person . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
8.9.2 Create or manage a user group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
4
Introduction
DSpace is an open source software platform that enables organisations to:
capture and describe digital material using a submission workflow module, or a variety of programmatic ingest options
distribute an organisation's digital assets over the web through a search and retrieval system
preserve digital assets over the long term
This system documentation includes a functional overview of the system, which is a good introduction to the capabilities of the system, and should be
readable by non-technical folk. Everyone should read this section first because it introduces some terminology used throughout the rest of the
documentation.
For people actually running a DSpace service, there is an installation guide, and sections on configuration and the directory structure. Support options are
available in the DSpace Support Guide.
For those interested in the details of how DSpace works, and those potentially interested in modifying the code for their own purposes, there is a detailed ar
chitecture section.
The DSpace Support Guide lists various places to ask for help, report bugs or security issues, etc.
The DSpace REST API contract which documents the REST API behavior, etc. If you want source code docs, we also provide JavaDocs for the
Java API layer which can be built by running mvn javadoc:javadoc
The DSpace Wiki contains stacks of useful information about the DSpace platform and the work people are doing with it. You are strongly
encouraged to visit this site and add information about your own work. Useful Wiki areas are:
A list of DSpace resources (Web sites, mailing lists etc.)
Technical FAQ
Registry of projects using DSpace
Guidelines for contributing back to DSpace
www.dspace.org has announcements and contains useful information about bringing up an instance of DSpace at your organization.
The DSpace Community List. Join DSpace-Community to ask questions or join discussions about non-technical aspects of building and running a
DSpace service. It is open to all DSpace users. Ask questions, share news, and spark discussion about DSpace with people managing other
DSpace sites. Watch DSpace-Community for news of software releases, user conferences, and announcements about DSpace.
The DSpace Technical List. DSpace developers & fellow community members help answer installation and technology questions, share
information and help each other solve technical problems through the DSpace-Tech mailing list. Post questions or contribute your expertise to
other developers working with the system.
The DSpace Development List. Join Discussions among DSpace Developers. The DSpace-Dev listserv is for DSpace developers working on the
DSpace platform to share ideas and discuss code changes to the open source platform. Join other developers to shape the evolution of the
DSpace software. The DSpace community depends on its members to frame functional requirements and high-level architecture, and to facilitate
programming, testing, documentation and to the project.
5
Release Notes
Upgrade from any past version of DSpace!
Installing DSpace provides an overview of the DSpace 7 installation process and all prerequisite software. You should review this before attempting an
upgrade, in order to ensure you are running the required versions of Java, Node, etc.
Upgrading DSpace provides a guide for upgrading from any old version of DSpace to v7. As in the past, your data migrates automatically, no matter which
older version you are running. However, as the old XMLUI and JSPUI user interfaces are no longer supported, you must switch to using the new User
Interface.
To upgrade to DSpace 7.6.3 from 7.x or any prior version, see Upgrading DSpace
To upgrade to 7.6.3, you MUST upgrade both the backend and frontend (user interface). Many bug fixes require updating both.
To install DSpace 7.6.3 for the first time, see Installing DSpace.
DSpace 7.6.3 provides bug fixes, accessibility & performance improvements to the 7.6.x platform. No new features are provided. As such this release
should be an easier upgrade for sites already running 7.6.x.
Angular Server Side Rendering (SSR) is no longer performed for all pages in DSpace. By default, new limitations are places on which
pages (or parts of pages) are rendered via SSR. This was done to help reduce the CPU/memory that is used by bot traffic. Many bots will trigger
SSR unnecessarily, resulting in potential performance issues. You may wish to review these new settings and adapt them, as necessary, to your
production site. These new settings are all configurable, allowing you to decide if the new behavior or old behavior is desired.
6
By default, SSR is only performed for paths in the sitemap. This means search engine crawlers can continue to access these sitemap
pages via SSR. However, it ensures bot traffic is limited to a subset of the site. See Server Side Rendering (SSR) Settings and #3682
By default, SSR is no longer performed for the search and browse components (on every page they are used). This ensures that
embedded searches/browses on Community, Collection, or Item (Entity) pages are not triggered by search engine crawlers or bots. See
Server Side Rendering (SSR) Settings and #3709
DSpace has a new, separate command-line log file ([dspace]/log/dspace-cli.log-[date]) for logging the output/results of any scripts
that are started from the command-line. The DSpace backend web application still logs to dspace.log-[date] in the same directory. You may
need to update any local/custom log management scripts to include this new log file.
This change was necessary to fix a major bug where the backend webapp would sometimes stop logging if a "logrotate" was triggered.
See #9832
The "Edit Item, Bitstreams tab" was refactored to use an HTML table. This was necessary to fix major accessibility and keyboard navigation
issues with the page. The entire page was rebuilt to us a <table> instead of using nested <div> tags. See #3464
7
On Browse by Issue Date page, the Year dropdown would sometimes be empty when an invalid date was encountered in metadata. #34
08 (Donated by Atmire)
Clicking on Browse by Author after Browse by Issue Date would generate a server error: #573 (Donated by Atmire)
After logging in via Shibboleth, the page content and admin sidebar didn't always load correctly: #3011 (Donated by Michael Spalti)
Many updates to dependencies. Removal of some older, unused dependencies.
Backend dependency updates (includes upgrading to the latest version of Spring 5.3 and Spring Security 5.7)
Frontend dependency updates (includes upgrading to the latest version of Angular 15)
Many other minor bug fixes as listed in Changes in 7.x.
7.6.3 Acknowledgments
The DSpace application would not exist without the hard work and support of its community. Thank you to the many developers who have worked very
hard to deliver all the bug fixes and improvements. This release was entirely volunteer driven!
Development Acknowledgments
A total of 46 unique individuals contributed to 7.6.3.
The above contributor list was determined based on contributions to the "dspace-angular" project in GitHub between 7.6.2 (after July 9, 2024) and 7.6.3
using "git shortlog" on the dspace-7_x branch and excluding all merge commits: git shortlog -s -n -e --no-merges --since 2024-07-09
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 7.6.2 (after July 9, 2024) and 7.6.3 using "git
shortlog" on the dspace-7_x branch and excluding all merge commits: git shortlog -s -n -e --no-merges --since 2024-07-09
To upgrade to DSpace 7.6.2 from 7.x or any prior version, see Upgrading DSpace
To upgrade to 7.6.2, you MUST upgrade both the backend and frontend (user interface). Many bug fixes require updating both.
To install DSpace 7.6.2 for the first time, see Installing DSpace.
DSpace 7.6.2 provides bug fixes, accessibility & performance improvements to the 7.6.x platform. No new features are provided. As such this release
should be an easier upgrade for sites already running 7.6.x.
Security Fixes
Fix CVE-2024-38364 (low severity) by disabling the ability to open HTML/XML bitstreams in a user's browser. See https://github.com/DSpace
/DSpace/security/advisories/GHSA-94cc-xjxr-pwvf (or mailing list announcements) for more details & configuration workaround. (Discovered and
reported by Muhammad Zeeshan (Xib3rR4dAr))
8
Performance Improvements
Disabled Angular "inlineCriticalCSS" in all Server Side Rendering (SSR). This provides a performance improvement to all SSR
generated pages. See https://github.com/DSpace/dspace-angular/pull/2901 (Donated by 4Science)
Media filter performance improvements when filtering a large number of bitstreams (for thumbnail creation or full text indexing). (Donated
by 4Science)
Submission form performance improvements. The submission form has been updated to ensure it no longer loads all related objects. (D
onated by Atmire)
Submission configuration reloading performance improvements. This also improves performance of creating a new Collection. (Donated
by Toni Prieto)
Updated robots.txt to stop crawlers from accessing search facets (Donated by Atmire)
Accessibility improvements in User Interface
Hidden "Skip to main content" button now exists on all pages. (Donated by Atmire)
Header / Navbar / Admin Sidebar accessibility fixes (Donated by 4Science)
Community list accessibility fixes (Donated by Hrafn Malmquist)
Color contrast fixes to "dspace" theme (Donated by Maciej Kleban)
Search results / MyDSpace / Item Edit / Browse by / Login menu accessibility fixes (Donated by Atmire)
Community/Collection Homepage accessibility fixes (Donated by Atmire)
Additional keyboard controls in Submission form (Donated by Atmire)
Browse by Author accessibility fixes (Donated by Neki-it)
"Loading" message accessibility improvements (Donated by Neki-it)
Fixing issue with header menu being keyboard accessible on small screens (Donated by Eike Löhden)
Fix color contrast issues with cookie settings popup (Donated by PCG Academia)
Submission form fixes
Fixed caching issues and instability of PATCH commands when editing date (Donated by 4Science)
Fixed issue where some changes could be lost after a save but reappear after reloading page (Donated by Atmire)
Fixed issue where metadata import (from external source) would only show first value for each metadata field instead of all values (Donat
ed by Atmire)
Fixed issue where vocabulary displayed value was not always appearing when editing an existing submission (Donated by Atmire)
Provide a way to deselect a value from a metadata field dropdown (Donated by Atmire)
Fixed bugs related to creating/deleting Entity relationships in the submission form (Donated by Atmire)
Fixed bugs where type-bind wasn't working for radio buttons and checkboxes (Donated by Max Nuding)
Fixed issues with CrossRef and Scopus metadata import (Donated by Sascha Szott)
Fixed issue with display of no results from CrossRef search (Donated by Philipp Rumpf)
Fixed issues with DataCite metadata import (Donated by Florian Gantner)
Statistics fixes
Solr Statistics: fixed issue where first visit to a repository was not always tracked because of a CSRF token mismatch.
Google Analytics 4 updated to only count file downloads from ORIGINAL bundle. See https://github.com/DSpace/DSpace/pull/8944 (Don
ated by Atmire)
Item Counts (webui.strengths) are now updating automatically again.
SEO improvements
Legacy bitstream URLs now return a 301 redirect (instead of a 302) (Donated by Atmire)
Missing identifiers now return an HTTP code 404 (Donated by Atmire)
Fixed bug where Community/Collection administrators could not add/edit a logo for a Community/Collection. (Donated by 4Science)
Fixed bug where Amazon S3 data store would sometimes leave around temp files during download process. See https://github.com/DSpace
/DSpace/pull/9477 (Donated by 4Science)
Fixed issue where failing ZIP exports could sometimes leave around a work directory.
Fixed issue where virtual metadata of Entities could cause failures during Item versioning & AIP import.
Fixed issue where indexing may fail if Full Text extraction fails (Donated by 4Science)
Fix several issues with editing Entity relationships (Donated by Atmire)
Fix several pagination issues with Item Mapper (Donated by Atmire)
Fixed issue where deleted admin users could cause the Processes page to no longer load properly. (Donated by Atmire)
System alerts now support basic HTML. See https://github.com/DSpace/dspace-angular/pull/3044 (Donated by Abel Gómez)
Fixed issue where batch imported bitstreams may be created without a resource policy type. (Donated by Agustina Martinez)
Fixed issue where Items imported via SWORD may be created without a resource policy type. (Donated by Paulo Graça)
Fixed display issue where Communities with diacritics were not sorted properly (Donated by Paulo Graça)
Fixed issue where a client's "user-agent" was not being forwarded to backend (REST API) (Donated by Alan Orth and Mark Cooper)
OAI-PMH DataCite crosswalk updated to support DataCite version 4.5 (Sponsored by The Library Code)
Minor updates to various dependencies for security purposes (both for user interface and backend)
Many other minor bug fixes as listed in Changes in 7.x.
7.6.2 Acknowledgments
The DSpace application would not exist without the hard work and support of its community. Thank you to the many developers who have worked very
hard to deliver all the bug fixes and improvements. This release was entirely volunteer driven!
Development Acknowledgments
A total of 58 unique individuals contributed to 7.6.2.
9
The following 35 individuals have contributed directly to the new DSpace (Angular) User Interface in this release (ordered by number of GitHub commits): A
lexandre Vryghem (alexandrevryghem), Tim Donohue (tdonohue), Davide Negretti (davide-negretti), Art Lowel (artlowel), Alan Orth (alanorth), Ricardo
Saraiva (rsaraivac), Sascha Szott (saschaszott), Lotte Hofstede (LotteHofstede), Jens Vannerum (jensvannerum), Oscar Chacón (oscar-escire), Pierre
Lasou (pilasou), Yury Bondarenko (ybnd), Francesco Molinaro (FrancescoMolinaro), Giuseppe Digilio (atarix83), Kuno Vercammen, Michael Spalti
(mspalti), Thomas Misilo (misilot), Abel Gómez (abelgomez), Andreas Awouters (AAwouters), Kim Shepherd (kshepherd), William Welling (wwelling), Yana
De Pauw (YanaDePauw), Maciej Kleban (Dawnkai), Victor Hugo Duran Santiago (VictorHugoDuranS), Agustina Martinez (amgciadev), Andrea Barbasso
(AndreaBarbasso), Bram Luyten (bram-atmire), Eike Löhden (Leano1998), Florian Gantner (floriangantner), Marie Verdonck (MarieVerdonck), Mohamed
Ali, NTK, Nona Luypaert (nona-luypaert), Max Nuding (hutattedonmyarm), Reeta Kuukoski (reetagithub).
The above contributor list was determined based on contributions to the "dspace-angular" project in GitHub between 7.6.1 (after Nov 15, 2023) and 7.6.2
using "git shortlog" on the dspace-7_x branch and excluding all merge commits: git shortlog -s -n -e --no-merges --since 2023-11-15
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 7.6.1 (after Nov 15, 2023) and 7.6.2 using
"git shortlog" on the dspace-7_x branch and excluding all merge commits: git shortlog -s -n -e --no-merges --since 2023-11-15
To try out DSpace 7.6.1 immediately, see Try out DSpace 7. This includes instructions for a quick-install via Docker, as well as information on our DSpace
demo site.
To upgrade to DSpace 7.6.1 from 7.x or any prior version, see Upgrading DSpace
To upgrade to 7.6.1, you MUST upgrade both the backend and frontend (user interface). Many bug fixes require updating both.
To install DSpace 7.6.1 for the first time, see Installing DSpace.
DSpace 7.6.1 provides bug fixes & performance improvements to the 7.6.x platform. No new features are provided. As such this release should be an
easier upgrade for sites already running 7.6.
Performance improvements
User interface no longer repeatedly requests the "/api" endpoint of the backend. https://github.com/DSpace/dspace-angular/issues/2482 (
Donated by Atmire)
"Edit Group" page performs much better for groups with a lot of members. https://github.com/DSpace/DSpace/issues/9052 and https://git
hub.com/DSpace/dspace-angular/issues/2512
Workflow Tasks page loads more quickly for a logged-in EPerson who is a member of many groups. https://github.com/DSpace/DSpace
/issues/9053 (Donated by Atmire)
OAI-PMH no longer has high memory usage during/after harvesting. https://github.com/DSpace/DSpace/issues/8846 (Donated by
Christian Bethge, ULB)
Improved performance of "./dspace update-handle-prefix" script. https://github.com/DSpace/DSpace/issues/9066 (Donated by 4Science)
Improved performance of "./dspace checker" script. https://github.com/DSpace/DSpace/issues/9180 (Donated by 4Science)
Improved performance of "./dspace generate-sitemaps" script (and automatic sitemaps). https://github.com/DSpace/DSpace/issues/3182
(Donated by 4Science)
General fixes
Specific browser plugins no longer cause the User Interface to appear blank. https://github.com/DSpace/dspace-angular/issues/2450 (Do
nated by floriangantner, U of Bamberg)
Emails sent by DSpace no longer have blank "Subject" fields. https://github.com/DSpace/DSpace/issues/8921 (Donated by Mark Wood,
Indiana University)
Navbar menu on mobile devices now works in Firefox/Safari. https://github.com/DSpace/dspace-angular/issues/2372 (Donated by
eScire)
Git is no longer required as a build dependency. https://github.com/DSpace/DSpace/pull/9032 (Donated by Cottage Labs)
Item view fixes
Handle redirects now work properly when "ui.nameSpace" config (on frontend) is customized. https://github.com/DSpace/dspace-angular
/issues/2517 (Donated by Atmire)
Bitstream URLs now work properly if "ui.nameSpace" config (on frontend) is customized. https://github.com/DSpace/dspace-angular
/issues/2446 (Donated by Atmire)
Thumbnails now work properly for Bitstreams having special characters. https://github.com/DSpace/DSpace/issues/9112 (Donated by
floriangantner, U of Bamberg)
Fixed display issue of media viewer (controls hidden behind header). https://github.com/DSpace/dspace-angular/issues/2341 (Donated
by 4Science)
MathJax code is no longer displayed twice. https://github.com/DSpace/dspace-angular/issues/2170 (Donated by Atmire)
10
Fixed issue where some "Edit Item" pages were visible (but not usable) to anonymous users. https://github.com/DSpace/dspace-angular
/issues/2609 (Donated by 4Science)
Browse/Search fixes
Hierarchical browse indexes for controlled vocabularies can now be disabled. https://github.com/DSpace/DSpace/issues/8947 (Donated
by Toni Prieto)
Pagination is now reset when changing searches. https://github.com/DSpace/dspace-angular/issues/2159 (Donated by eScire)
Unicode characters now work properly in search filters. https://github.com/DSpace/DSpace/issues/8914 (Donated by Toni Prieto)
Submission/Workflow form fixes
Submission form no longer hangs / enters infinite loop in specific scenarios. https://github.com/DSpace/dspace-angular/issues/1924 and
https://github.com/DSpace/dspace-angular/issues/2577 (Donated by 4Science)
Freetext is now supported in controlled vocabulary fields where "closed=false". https://github.com/DSpace/dspace-angular/issues/2435 (
Donated by Atmire, with support from the International Livestock Research Institute)
Date field can now be modified easier when editing an existing submission. https://github.com/DSpace/dspace-angular/issues/2588 (Don
ated by 4Science)
"metadata.hide" fields are no longer hidden from submitters/reviewers. https://github.com/DSpace/dspace-angular/issues/1997 (Donated
by Toni Prieto)
"Type" dropdown no longer changes to first entry in list when pressing Enter. https://github.com/DSpace/dspace-angular/issues/2145 (Do
nated by eScire)
Workflow curation tasks can now be "queued" for later. https://github.com/DSpace/DSpace/issues/9070 (Donated by Agustina Martinez,
Cambridge University)
Removed "Add More" button when additional sections not available. https://github.com/DSpace/dspace-angular/issues/2535 (Donated
by Atmire)
Added submitter information in "dc.description.provenance" as in older versions. https://github.com/DSpace/DSpace/issues/8585 (Donat
ed by Arvo Consultores)
Authentication fixes
Entering an incorrect password no longer results in a blank login menu in Firefox. https://github.com/DSpace/dspace-angular/issues/2515
(Donated by Atmire)
LDAP is no longer broken if groupmaps were enabled. https://github.com/DSpace/DSpace/issues/8920 (Donated by wwuck)
Special groups are only added for the authentication system the user used to login. https://github.com/DSpace/DSpace/issues/9127 (Don
ated by 4Science)
Login popup now lists login methods in order of configuration. https://github.com/DSpace/dspace-angular/issues/2365 (Donated by
Atmire)
Admin tools fixes
A new Metadata Schema in registry no longer requires a page reload to appear. https://github.com/DSpace/dspace-angular/issues/1081 (
Donated by eScire)
Moving items between collections now inherits policies correctly. https://github.com/DSpace/DSpace/issues/8987 (Donated by Atmire)
Statistics fixes
Google Analytics bitstream statistics are now limited to the "ORIGINAL" bundle. https://github.com/DSpace/DSpace/issues/8938 (Donate
d by Atmire)
Theming fixes
Configuring themes using "handle" now works properly. https://github.com/DSpace/dspace-angular/issues/2348 (Donated by Atmire)
Extending a theme no longer causes it to potentially render multiple times. https://github.com/DSpace/dspace-angular/issues/2346 (Dona
ted by Atmire)
Search Engine Optimization fixes
Access restricted and "Non-Discoverable" Items are no longer listed in Sitemaps. https://github.com/DSpace/DSpace/issues/5394 and htt
ps://github.com/DSpace/DSpace/issues/5343 (Donated by 4Science)
Accessibility fixes
Fixes to accessibility of Community List (/community-list) page. https://github.com/DSpace/dspace-angular/issues/2251 (Donated by
Cottage Labs)
Added an invisible "Skip to main content" button on all pages. https://github.com/DSpace/dspace-angular/issues/2523 (Donated by
Atmire)
Translation bug fixes
Translation files (i18n files) are now hashed, so they are no longer reloaded unless they change. https://github.com/DSpace/dspace-
angular/issues/2461 (Donated by Atmire)
Localized "default_[language-code].license" files are supported again. https://github.com/DSpace/DSpace/issues/8882 (Donated by PCG
Academia)
Translations are now working properly for item status badges in MyDSpace. https://github.com/DSpace/dspace-angular/issues/2387 (Don
ated by Mirko Scherf)
Replication Task Suite version 7.6 has been released to add compatibility with all DSpace 7.6.x releases. This Maven plugin can be used to
provide extra curation tasks for AIP Backup and Restore.
Many other bug fixes and dependency updates as listed in Changes in 7.x.
7.6.1 Acknowledgments
The DSpace application would not exist without the hard work and support of its community. Thank you to the many developers who have worked very
hard to deliver all the bug fixes and improvements. This release was entirely volunteer driven!
11
Development Acknowledgments
A total of 38 unique individuals contributed to 7.6.1.
The above contributor lists were determined based on contributions to the "dspace-angular" project in GitHub between 7.6 (after June 23, 2023) and 7.6.1:
https://github.com/DSpace/dspace-angular/graphs/contributors?from=2023-06-23&to=2023-11-15&type=c
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 7.6 (after June 23, 2023) and 7.6.1: https://git
hub.com/DSpace/dspace/graphs/contributors?from=2023-06-23&to=2023-11-15&type=c
To try out DSpace 7.6 immediately, see Try out DSpace 7. This includes instructions for a quick-install via Docker, as well as information on our sandbox
/demo site for DSpace 7.
To upgrade to DSpace 7.6 from 7.x or any prior version, see Upgrading DSpace.
To install DSpace 7.6 for the first time, see Installing DSpace.
DSpace 7.6 provides new features & bug fixes to the 7.x platform.
Bulk Access Management allows someone with administrative privileges (site wide or over a single object) to perform bulk modifications to the
permissions of objects they administer. This allows for the ability to add/remove embargoes or other access restrictions to many objects at once. (
Developed and co-funded by 4Science with the Support of University of Cambridge)
Support for selecting Primary Bitstream for archived Items, similar to version 6.x. The existing "primary bitstream" selector now functions
properly when editing a Bitstream.
Item counts can now be displayed for all Communities/Collections similar to version 6.x. (Donated by PCG Academia)
Browse Hierarchical Controlled Vocabularies This new feature allows users to browse/search for items quickly using the same controlled
vocabulary configured in your submission forms. (Donated by Atmire)
Signposting support for items and bitstreams. This new feature embeds signposting links/metadata into pages and responses, to better
support FAIR guiding principles. (Developed and donated by 4Science)
Import Simple Archive Format Zip files from a remote URL. This feature enhances the existing batch import feature to allow you to specify the
URL of the ZIP file to import, instead of using the drag & drop upload. (Developed and donated by 4Science)
ImageMagick Thumbnails for Video files (MP4). A new "ImageMagick Video Thumbnail" plugin can be used to generate thumbnails from Video
files using FFmpeg. (Donated by Abel Gómez)
Ability to map Item submission forms via Entity Type to easily configure a submission form for all Collections accepting the same Entity Type.
See the Configurable Entities documentation. (Donated by Paulo Graça)
New default Privacy Statement and End User Agreement . The new default text of these policies can be found by visiting the links in the footer
of our demo site. (Donated by DSpaceDirect)
Oracle support has been removed as was previously announced in March 2022 on our mailing lists.
12
Fixed issue where deselecting a filter could return an invalid query. (Donated by eScire)
Ensured Collections/Communities are now listed alphabetically in all search/selection popups (and sort order is configurable in config.
yml) (Donated by Arvo Consultores)
Fixed issues with "-i" and "-r" params to index-discovery script (Donated by Agustina Martinez)
Submission form fixes
Fixed issue where "list" input type would not work properly when set to required
Adding/removing new relationships (between Entities) would sometimes not update on the form (Donated by Atmire)
Date picker was not always respecting start/end dates for access options (Donated by Mark Wood)
Hint was missing from subject keywords field, and "list" and "tag" input types (Donated by Atmire)
Fixed several issues with imports from external sources to avoid cached errors and double submissions (Donated by Atmire and
4Science and Alan Orth and others)
Fixed issues where a required, hidden field could make it impossible to complete submission (Donated by 4Science)
Disabled collection box once submission is in workflow state. Collection cannot be changed during workflow. (Donated by eScire)
Fixed visibility of read-only fields and sections (Donated by 4Science)
Fixed issue where CC0 was displayed twice in Creative Commons section (Donated by Atmire)
Statistics fixes
Fixed issue where Solr statistics were not loading after sharding (Donated by Nicholas Woodward)
Fixed issue where restricted objects appeared in Solr statistics with an empty name
Fixed issue where the referrer for a "view" event was almost always incorrect (Donated by Atmire)
Fixed issue where "search_result" events were not being captured in statistics. (Donated by Atmire)
Permission inheritance fixes
When moving an Item to a new Collection (Donated by Agustina Martinez)
Where Bundles were accidentally inheriting access permissions for Bitstreams (Donated by Kim Shepherd)
Administrative fixes
Enhanced functionality for deleting multiple bitstreams (of the same Item) at once. Fixed issue where sometimes only the first bitstream
would be deleted. (Donated by Atmire)
Fixed issue where only one thumbnail was deleted when running media-filter in force mode. (Developed and donated by 4Science)
Fixed issue where an Admin couldn't reset another user's password if Captcha was enabled. (Donated by @Ma-Tador)
Fixed issues with checksum reporter/checker no longer working properly. (Developed and donated by 4Science)
Fixed issue where Collection Admin could not edit an Item Template (Donated by D&L)
Fixed issue where EPerson deletion could result in page loading issues (Developed and donated by 4Science)
Fixed bug where edit form for a Commuity/Collection/Item would not always update if you selected a different object immediately (Develo
ped and donated by 4Science)
Improved validation of input fields when creating new Metadata schema or fields. (Donated by Atmire)
Fixed issue where workflow actions (from "Administer workflow" page) were not working. (Developed and donated by 4Science)
Fixed issue where curation tasks could not be performed by Community or Collection Admins. (Developed and donated by 4Science)
Request a Copy fixes
Fixed issue with response page not loading
Fixed issue where "helpdesk" strategy required authentication in order to respond. (Donated by Arvo Consultores)
Moved text of approval/rejection emails to backend along with other email templates (Donated by Mark Wood)
Added Multipart upload support to Amazon S3 Bitstore plugin (allows for larger uploads). (Developed and donated by 4Science)
Added "webui.content_disposition_format" configuration to support always downloading (as an attachment) specific file formats. (Donated by
Atmire)
Fixed issue where page could fail to load if browse cookie language code was invalid (Donated by Atmire)
Fixed issue where Shibboleth authentication would not reload the current page after authentication (Donated by Michael Spalti)
Fixed several bugs with the ORCID Authority Control plugin and added more detailed documentation on enabling it. (Donated by uofmsean with
support from others)
Fixed issue where RSS feed was wrongly sorting alphabetically. Corrected to sort by date (like recent submissions). (Donated by Nicholas
Woodward)
Fixed issue where you could not enable the video viewer without also enabling the image viewer. (Donated by Atmire)
Fixed issue where Batch import (Zip) would throw a confusing error if the file size was too large (Donated by Nicholas Woodward)
Fixed issue where Batch import (Zip) was not cleaning up temporary files after an error occurred (Donated by Nicholas Woodward)
GDPR compliance fix. An unnecessary external font was being loaded into the UI. (Donated by Nicholas Woodward)
Minor updates/fixes to IIIF support (Donated by Michael Spalti)
Made many more user interface components themeable
Numerous user interface accessibility and usability enhancements (Donated by Adam Doan and eScire and Atmire)
Docker script build enhancements (Donated by DSpaceDirect)
Numerous other small bug fixes and dependency updates to frontend and backend.
7.6 Acknowledgments
The DSpace application would not exist without the hard work and support of its community. Thank you to the many developers who have worked very
hard to deliver all the new features and improvements. We deeply appreciate the financial contributions by those institutions that have contributed to the DS
pace Development Fund. Also, thanks to the users who provided input and feedback on the development, those who contributed documentation as well
those who participated in the testathons.
Development Acknowledgments
13
A total of 45 unique individuals contributed to 7.6, with major institutional contributions coming from 4Science and Atmire.
The above contributor lists were determined based on contributions to the "dspace-angular" project in GitHub between 7.5 (after February 17, 2023) and
7.6: https://github.com/DSpace/dspace-angular/graphs/contributors?from=2023-02-17&to=2023-06-23&type=c
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 7.5 (after February 17, 2023) and 7.6: https://
github.com/DSpace/dspace/graphs/contributors?from=2023-02-17&to=2023-06-23&type=c
Financial Acknowledgments
We gratefully recognize those institutions who have generously pledged to financially support one or more 7.x releases via the DSpace Development Fund.
A list of those funders can be found on the DSpace Development Fund page.
To try out DSpace 7.5 immediately, see Try out DSpace 7. This includes instructions for a quick-install via Docker, as well as information on our sandbox
/demo site for DSpace 7.
To upgrade to DSpace 7.5 from 7.x or any prior version, see Upgrading DSpace.
To install DSpace 7.5 for the first time, see Installing DSpace.
DSpace 7.5 provides new features & bug fixes to the 7.x platform.
Subscribe to email updates from a Community or Collection (ported from the DSpace-CRIS project, partially donated and developed by
4Science)
NOTE: This feature requires new scheduled cron settings to enable sending of emails. See Scheduled Tasks via Cron
Supervision orders allow Administrators to assign Group(s) to supervise other user's (in-progress) submissions
System wide alerts which allow you to display a site-wide banner to announce scheduled downtime or maintenance (Login as an Admin and visit
the "System-wide Alert" menu option)
Ability to disable self-registration or restrict to specific email domains. See Authentication by Password documentation for more details.
Search interface supports filtering by controlled vocabularies See the new "vocabularies" option in config.*.yml. By default, only the SRSC
vocabulary is available in DSpace, but additional ones may be added as described in Authority Control of Metadata Values
Support for custom Configurable Workflow steps, including "Select Single Reviewer Workflow" and "Score Review Workflow" (which
were in v6.x). See Additional workflow steps documentation for more details.
Ability to add Contextual Help Tooltips on any page. Currently, only a basic example is provided, with more likely to be added in future
DSpace releases. In the meantime, additional tooltips may be added manually on a page-by-page basis. See Contextual Help Tooltips for more
details.
Basic MediaViewer now supports captioning for audio/video files. (Donated by yingjin) The caption file must be provided alongside the video
/audio file. See documentation at Media Viewer Settings
On Item pages, the "Edit Item > Metadata" tab has been redesigned to allow for easier reordering of metadata values, etc.
On Item pages, metadata values can automatically link to "Browse By" results for that value. (Donated by The Library Code with the
Support of German Institute for Urban Affairs) The existing "webui.browse.links.<n>" settings in dspace.cfg are now supported in version 7.x. See
Links to Other Browse Contexts section of the Configuration documentation.
"Show identifiers" submission allows you to optionally pre-register Handles or DOIs for all new submissions. (Donated by The Library
Code with support of Technische Universität Hamburg and Technische Informationsbibliothek) See Configuring the Identifiers step and Configurin
g pre-registration of Identifiers
New DataCite plugin for importing metadata from DataCite when starting a new submission. (Donated by @johannastaudinger, @florianga
ntner and @philipprumpf)
14
New "dspace database skip" command can be used to skip problematic database migrations during upgrade to 7.x. See this common
migration error that has impacted some older DSpace sites during their upgrade to 7.x.
Brazilian Portugese (Português do Brasil) language updates donated by Lucas Zinato Carraro (lucaszc)
Catalan (Català) language added & donated by Toni Prieto
French (Français) language updates donated by Pierre Lasou (pilasou)
German (Deutsch) language updates donated by Sascha Szott (saschaszott)
Greek () language updates donated by LisAtUP
Kazakh () language updates donated by myrza1
Polish (Polski) language added & donated by PCG Academia & Micha Dykas (michdyk)
Spanish (Español) language updates donated by Toni Prieto & Cristian Emanuelle Guzmán Suárez (CrisGuzmanS)
Ukrainian (Y) language added & donated by AndrukhivAndriy
7.5 Acknowledgments
The DSpace application would not exist without the hard work and support of its community. Thank you to the many developers who have worked very
hard to deliver all the new features and improvements. We deeply appreciate the financial contributions by those institutions that have contributed to the DS
pace Development Fund. Also, thanks to the users who provided input and feedback on the development, those who contributed documentation as well
those who participated in the testathons.
Development Acknowledgments
A total of 44 unique individuals contributed to 7.5, with major institutional contributions coming from 4Science and Atmire.
15
Suárez (CrisGuzmanS), nikunj59, yingjin, Marie Verdonck (MarieVerdonck), Toni Prieto (toniprieto), Mark Wood (mwoodiupui), Art Lowel (artlowel),
Nicholas Woodward (nwoodward), Sergio Fernández Celorio (sergius02), Lucas Zinato Carraro (lucaszc), Sascha Szott (saschaszott), Micha Dykas
(michdyk), Vincenzo Mecca (vins01-4science), Pierre Lasou (pilasou), Max Nuding (hutattedonmyarm), dsteelma-umd, Mykhaylo Boychuk
(Micheleboychuk), myrza1, AndrukhivAndriy, LisAtUP, tony
The above contributor lists were determined based on contributions to the "dspace-angular" project in GitHub between 7.4 (after October 6, 2022) and 7.5:
https://github.com/DSpace/dspace-angular/graphs/contributors?from=2022-10-06&to=2023-02-16&type=c
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 7.4 (after October 6, 2022) and 7.5: https://git
hub.com/DSpace/DSpace/graphs/contributors?from=2022-10-06&to=2023-02-16&type=c
Financial Acknowledgments
We gratefully recognize those institutions who have generously pledged to financially support one or more 7.x releases via the DSpace Development Fund.
A list of those funders can be found on the DSpace Development Fund page.
To upgrade to DSpace 7.4 from 7.x or any prior version, see Upgrading DSpace.
To install DSpace 7.4 for the first time, see Installing DSpace.
DSpace 7.4 provides new features & bug fixes to the 7.x platform.
Recent Submissions are now listed on the homepage. (See "recentSubmissions" options) (donated by DSpaceDirect, developed by DSquare
Technologies)
Batch export (to Zip) of Collections or Items available in the Admin UI
Batch import (from Zip) available in the Admin UI.
Thumbnails are now displayed in all search/browse screens. (See "showThumbnails" option) (donated by Michael Spalti, Willamette University)
Support for Google Captcha on "New user registration" form. (See "registration.verification.enabled" in dspace.cfg) (donated by 4Science)
Support for Google Analytics 4 (“Universal Analytics” is still also supported)
Support for Markdown, HTML and MathJax in Item abstracts. Support for line breaks in all metadata fields.
Support for Remote Handle Resolver now included, allowing for Handle Server to be run remotely.
Enhanced support for Amazon S3 as a storage location, including support for IAM Roles.
Allow for deletion of old Processes from the Admin UI. Bulk deletion script (process-cleaner) also available.
Request a Copy can now be sent to multiple recipients at once (donated by Mark Wood, IUPUI)
Privacy statement and end user agreement can now be disabled. (See "info" options)
Add configurations for page sizes to many browse/search pages. (See various "pageSize" options in User Interface Configuration) (donated by
Mark Wood, IUPUI, and Mark Cooper, LYRASIS)
Fixed issue where “start-handle-server” command line tool wasn’t working properly (donated by Jean-François Morin, Université Laval and Mark
Wood, IUPUI)
MyDSpace: Entities are now displayed properly, workflow tasks now refresh after state change
Submission: After drag & drop of a file, the user goes directly to the submission form. (donated by Nicolas Boulay, Université Laval)
Statistics: fixed issue where file names were sometimes missing or replaced by UUID.
Thumbnails: restricted thumbnails are now supported and are displayed if you have access.
Search: admins were seeing private/withdrawn items in normal searches. Those now are only visible in the “Admin Search” page.
Browse by pages: fixed issue where pagination was not working properly on all browse by pages and diacritics were not working properly in filter
boxes
Curation tasks: the output of the task is now visible from the Admin UI.
Password authentication security enhancements: user must provide current password when changing their password. Password minimum
requirements are now configurable, allowing for you to require users to create more secure passwords (See "authentication-password.regex-
validation.pattern" in dspace.cfg)
Shibboleth authentication: Admin menu now reloads automatically after login
Numerous caching & performance fixes to ensure various pages will now automatically refresh whenever underlying objects change.
See the 7.4 milestone for frontend and backend for a list of all changes applied in 7.4.
16
French (Français) language updates donated by Pierre Lasou (pilasou)
German (Deutsch) language updates donated by Sascha Szott (saschaszott)
Greek () language added & donated by Kostis Alexandris (kostisalex)
Hindi () language added & donated by DSquare Technologies
Kazakh () language added & donated by myrza1
Brazilian Portuguese (Português do Brasil) updates donated by João Fernandes (joao-uefrom), Lucas Zinato Carraro (lucaszc) and Danilo Felicio
Jr (danilofjr)
Spanish (Español) language updates donated by Arvo Consultores y Tecnología. S.L
Swedish (Svenska) language added & donated by Urban Andersson (jokermanse). Additional updates donated by Reeta Kuuskoski (reetagithub)
7.4 Acknowledgments
The DSpace application would not exist without the hard work and support of its community. Thank you to the many developers who have worked very
hard to deliver all the new features and improvements. We deeply appreciate the financial contributions by those institutions that have contributed to the DS
pace Development Fund. Also, thanks to the users who provided input and feedback on the development, those who contributed documentation as well
those who participated in the testathons.
Development Acknowledgments
A total of 49 unique individuals contributed to 7.4, with major institutional contributions coming from 4Science and Atmire.
The above contributor lists were determined based on contributions to the "dspace-angular" project in GitHub between 7.3 (after June 24, 2022) and 7.4: htt
ps://github.com/DSpace/dspace-angular/graphs/contributors?from=2022-06-24&to=2022-10-06&type=c
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 7.3 (after June 24, 2022) and 7.4: https://githu
b.com/DSpace/DSpace/graphs/contributors?from=2022-06-24&to=2022-10-06&type=c
Financial Acknowledgments
We gratefully recognize those institutions who have generously pledged to financially support one or more 7.x releases via the DSpace Development Fund.
A list of those funders can be found on the DSpace Development Fund page.
To upgrade to DSpace 7.3 from 7.x or any prior version, see Upgrading DSpace.
To install DSpace 7.3 for the first time, see Installing DSpace.
DSpace 7.3 provides new features & bug fixes to the 7.x platform.
ORCID Authentication and synchronization to a DSpace Researcher Profile (ported from the DSpace-CRIS project, partially donated and
developed by 4Science). See ORCID Integration and Researcher Profiles
Import content directly from 9 new external services including CrossRef, Scopus, Web of Science, PubMed Europe, CiNii, NASA
Astrophysics Data System (ADS), VuFind.org, SciELO.org, and the European Patent Office (EPO) (ported from the DSpace-CRIS project,
partially donated and developed by 4Science). See Importing Items via basic bibliographic formats (Endnote, BibTex, RIS, CSV, etc) and online
services (arXiv, PubMed, CrossRef, CiNii, etc)
Configurable Entities now support Item Versioning. It is now possible to create versions of Entities which automatically retain all relationships.
Examples of how versioning works can be found in "Versioning Support" section of "Configurable Entities". Also refer to Item Level Versioning
documentation for a general overview of how versioning works in DSpace.
17
Admin "Health" menu provides basic control panel functionality (based on 6.x Control Panel). When logged in as an Administrator, select
"Health" from the side menu. You'll see a "Status" tab which provides useful information about the status of the DSpace backend, and an "Info"
tab which provides an overview of backend configurations and Java information.
Validate a Batch Metadata CSV before applying changes (similar to 6.x). When uploading a CSV for batch updates (using "Import" menu), a
new "Validate Only" option is selected by default. When selected, the uploaded CSV will only be validated & you'll receive a report of the detected
changes in the CSV. This allows you to verify the changes are correct before applying them. (NOTE: applying the changes requires re-
submitting the CSV with the "Validate Only" option deselected)
Export search results to a CSV (similar to 6.x). When logged in as an Administrator, after performing a search a new "Export search results as
CSV" button appears. Clicking it will export the metadata of all items in your search results to a CSV. This CSV can then be used to perform
batch metadata updates (based on the items in your search results).
Submission forms support display of SHERPA/RoMEO publisher policies when an ISSN is entered (similar to 6.x). A new "sherpaPolicies"
optional submission step exists. When enabled, a new "Sherpa policies" section appears in the form. If a submitter enters an ISSN, then that form
section will display publisher policies based on that ISSN. See "Configuring the Sherpa Romeo step" in the "Submission User Interface"
Submission forms support type-based fields (ported from the DSpace-CRIS project by Kim Shepherd of The Library Code, with support from T
echnische Universität Berlin) (similar to 6.x): Based on the selected Type (dc.type), the form may dynamically add/change metadata fields
available. See "Item Type Based Metadata Collection" in Submission User Interface
Preview items during workflow approval (similar to 6.x). When viewing the list of all items under workflow (in "My DSpace" "Show: Workflow
Tasks"), it's now possible to preview each item (before claiming or reviewing) by clicking the View button.
Withdrawn items now show a tombstone page (similar to 6.x). When an item is withdrawn, accessing it's homepage as a non-Administrator
now brings you to a "tombstone" page which notes that the Item was withdrawn. (Administrative users can see see the entire item when they are
logged in)
RSS/Atom feeds for Site, Community & Collection pages (Donated by Atmire) (similar to 6.x). On the homepage, and on every Community or
Collection page, a Syndication Feed icon now appears. Clicking it brings you to an Atom feed of the most recent submissions (either sitewide, or
specific to that Community/Collection).
Optionally, Item "access status" badges (e.g. "open access", "restricted", "metadata only", "embargoed") can be displayed in all Item
lists. (Donated by Université Laval) This feature can be enabled via the new "showAccessStatuses" setting in config.*.yml. See User Interface
Configuration and https://github.com/DSpace/dspace-angular/issues/1566
Optionally, a welcome email can be sent to all newly registered users (Donated by Mark H. Wood of IUPUI). This can be enabled using the
new "mail.welcome.enabled" backend configuration in your local.cfg.
Oracle Database support has been deprecated. It will be removed in mid-2023. All sites should plan a migration to PostgreSQL. See https://gith
ub.com/DSpace/DSpace/issues/8214
Migrated text extraction to use Apache Tika. There is now a single "Text Extractor" media filter plugin. See Mediafilters for Transforming DSpace
Content for more details.
Major frontend performance improvements: initial page load is much quicker. All assets are now zipped/minified. Caching improvements.
Frontend can be easily run from a subpath (e.g. https://my.university.edu/dspace) (Donated by Harvard University, developed by William Welling )
"custom" theme was throwing TypeErrors in 7.2. That has been fixed.
Renamed "private/public" status to be "non-discoverable/discoverable", which is more accurate. This item status just meant the item was not
findable/discoverable through DSpace search/browse. It has nothing to do with whether the item is accessible or not accessible anonymously.
Added "Now Showing..." informational counts to all Browse By pages (similar to in 6.x)
Changed Browse "jump to" boxes to act as a filter, rather than jumping to a specific page. Updated display to make this clearer.
Fixed several bugs related to resource policies
Couldn't set an end date on a resource policy in some situations
Couldn't edit an existing resource policy (previously you had to delete & recreate the policy)
Fixed bug where resubmitting an Item move could result in Item deletion.
Fixed several bugs in submission form, including
Improved form validation and error display
Occasionally, null metadata values were being saved
In some scenarios, submissions could be completed without agreeing to license.
Fixed bug where Creative Commons step wasn't loading properly when configured
Fixed bug where radio buttons were not working properly when submission field was set as a "list" that was not repeatable.
Fixed several bugs in bitstream editing form
Fixed bug with browse by issue date wouldn't work if it encountered an invalid date in the first item. (Donated by 4Science)
Fixed bug where bitstream edit page wouldn't load if a bitstream's access policies were all deleted.
Fixed bug where restricted bitstream downloads were not checking a user's "special groups" which may provide them permissions. (Donated by 4
Science)
Frontend has been upgraded to Angular 13.
Backend has been updated to Spring Boot 2.6
Numerous other dependency updates
See the 7.3 milestone for frontend and backend for a list of all changes applied in 7.3.
7.3 Acknowledgments
The DSpace application would not exist without the hard work and support of its community. Thank you to the many developers who have worked very
hard to deliver all the new features and improvements. We deeply appreciate the financial contributions by those institutions that have contributed to the DS
pace Development Fund. Also, thanks to the users who provided input and feedback on the development, those who contributed documentation as well
those who participated in the testathons.
18
Development Acknowledgments
A total of 37 unique individuals contributed to 7.3, with major institutional contributions coming from 4Science and Atmire.
The above contributor lists were determined based on contributions to the "dspace-angular" project in GitHub between 7.2 (after Feb 3, 2022) and 7.3: http
s://github.com/DSpace/dspace-angular/graphs/contributors?from=2022-02-03&to=2022-06-24&type=c
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 7.2 (after Feb 3, 2022) and 7.3: https://github.
com/DSpace/DSpace/graphs/contributors?from=2022-02-03&to=2022-06-24&type=c
Financial Acknowledgments
We gratefully recognize the following institutions who have generously pledged financially to support the 7.3 release via the DSpace Development Fund.
DSpace 7.0, 7.1 and 7.2 all used a bundled version of the Apache Spring Libraries which are vulnerable to RCE (remote command execution). The CVE-
2022-22965 vulnerability is described in more detail at https://spring.io/blog/2022/03/31/spring-framework-rce-early-announcement
If you cannot upgrade immediately, other workarounds / alternative fixes are documented in the patch PR at https://github.com/DSpace/DSpace/pull/8231
DSpace 7.2.1 only contains an update to the Apache Spring Libraries to ensure DSpace is not vulnerable to CVE-2022-22965. As such, it was only a
Backend / REST API release. The DSpace 7.2 Frontend (UI) can be used with the DSpace 7.2.1 Backend.
1. Upgrade your DSpace backend (REST API) to version 7.2.1 immediately. This backend is compatible with the DSpace Frontend version 7.2
(only)
a. If you are unable to perform this upgrade, you may patch your 7.0 or 7.1 site by applying the changes in PR #8231. Instructions can be
found in that PR.
2. Optionally, upgrade your Apache Tomcat to version 9.0.62 (which also has extra guards against this vulnerability).
3. Make sure to restart Tomcat after updates have been applied.
At this time, DSpace 6.x and below appear unaffected by CVE-2022-22965, as they all used Java/JDK 8 (or below) which is documented as not
impacted. The vulnerability is only possible when using Java/JDK 9 or above.
To try out DSpace 7.2 immediately, see Try out DSpace 7. This includes instructions for a quick-install via Docker, as well as information on our sandbox
/demo site for DSpace 7.
To upgrade to DSpace 7.2 from 7.x or any prior version, see Upgrading DSpace.
19
To install DSpace 7.2 for the first time, see Installing DSpace.
DSpace 7.2 provides new features & bug fixes to the 7.x platform.
Runtime Configuration for the User Interface (Donated by Harvard University, developed by William Welling ): In DSpace 7.0 and 7.1, changes
to your User Interface Configuration required rebuilding the entire UI (which could take 10+ minutes). As of 7.2, all User Interface configurations
are loaded at runtime. So, to change configurations just requires a quick restart of the User Interface (which usually takes only a few
seconds). The configuration format also changed from Typescript to YAML to support this feature. A "yarn env:yaml" migration script is provided
to migrate the old format to the new one. See User Interface Configuration for more details.
Add Item Embargoes / Restrictions in the Submission User Interface: A new, optional "itemAccessConditions" step exists in the Submission
configuration. Enabling it adds a section which allows you to select access restrictions, embargoes or leases. It also allows you to select whether
the Item is discoverable via search/browse. See Submission User Interface and Embargo documentation for details.
Feedback Form: A feedback form is now linked in the footer of every page, as long as a "feedback.recipient" is specified in your local.cfg. This
feature allow users to contact the configured "feedback.recipient" from any page in the site.
OpenID Connect (OIDC) Authentication Plugin (Ported from the DSpace-CRIS project by Hardy Pottinger of California Digital Library, with
support from 4Science): DSpace now supports single sign on using OpenID Connect (OIDC), which allows it to support authentication through pro
viders such as Google, Microsoft, Amazon, etc. For more information on setting this up, see the Authentication Plugins page.
IIIF Enhancements (Donated by Michael Spalti of Willamette University): Includes support for adding IIIF metadata using Importing and
Exporting Items via Simple Archive Format (SAF), editing IIIF bitstream metadata from the User Interface (when editing an existing Bitstream),
and a new "./dspace iiif-canvas-dimensions" CLI tool for auto-populating IIIF canvas dimensions in bulk.
Running "filter-media" (Mediafilters) from Processes User Interface. Administrators can now run the "filter-media" script from the Admin UI
("Processes" menu), in order to immediately update thumbnails, full text indexing, etc. See Mediafilters for Transforming DSpace Content for more
details about this script.
Improved support for custom "Browse By" configurations. User Interface "Browse by" options are now retrieved dynamically from the REST
API, based on the backend's configured browse by indexes (see "webui.browse.index.*" options documented in the Configuration Reference)
Backend has added support for JDK 17. The DSpace backend now supports either JDK 11 or JDK 17.
Frontend has been upgraded to Angular 11.
Solr now uses a connection pool by default (Donated by Mark H. Wood of IUPUI). See Configuration Reference for details of new "solr.client.
*" configs in dspace.cfg.
User interface would load indefinitely if the REST API was unavailable. Now, an error page is displayed to let you know the REST API is
unresponsive.
User Interface deployment required the "node_modules" folder to exist, making it more difficult to containerize (e.g. Docker). Now, the UI can be
deployed via only the "dist" folder. See https://github.com/DSpace/dspace-angular/issues/1410 (Donated by Harvard University, developed by Will
iam Welling)
Searches with invalid syntax or special characters would load indefinitely. Now an error is displayed if the syntax is invalid in some way.
On Item page, very long metadata fields or file names would break the page layout
When an Item had an invalid or empty "dspace.entity.type" metadata field, it was unable to be deleted.
On Submission page, if you drag & drop a file to start a submission, the Collection selection window sometimes did not load properly.
On Submission page, sometimes the "Deposit" button would not enable even when all required fields are filled out. "Deposit" button is now
always enabled, but it will block submission if required fields are missing.
On Submission page, fixed several bugs with editing / setting embargoes or access restrictions on uploaded files.
Statistics were always accessible publicly, even if restricted to Administrators. Statistics now are only accessible to Admins when usage-
statistics.authorization.admin.usage is set to true in local.cfg
Administrators were not able to reset passwords of other users.
On Processes page, scripts could not be run without parameters. Additionally, fixed display of dates so they always appear as UTC time
When both Shibboleth and DSpace password authentication were enabled, users were able to change their password in DSpace in order to
bypass Shibboleth.
On backend, improved indexing performance. (Donated by 4Science)
On backend, improved file download performance to avoid connection leaks when S3 is used as backend storage.
Numerous other minor bug fixes or accessibility improvements. See the 7.2 milestone for frontend and backend for a list of all changes applied in
7.2.
Scottish Gaelic (Gàidhlig) user interface support added (Donated by Donald I Macdonald and Stòrlann Nàiseanta na Gàidhlig)
German (Deutsch) user interface support had a syntax error which caused it not to work properly
7.2 Acknowledgments
A total of 28 unique individuals contributed to 7.2, with major institutional contributions coming from 4Science and Atmire.
The above contributor lists were determined based on contributions to the "dspace-angular" project in GitHub between 7.1 (after Oct 28, 2021) and 7.2: http
s://github.com/DSpace/dspace-angular/graphs/contributors?from=2021-10-28&to=2022-02-03&type=c
20
Backend / REST API Acknowledgments
The following 23 individuals have contributed directly to the DSpace backend (REST API, Java API, OAI-PMH, etc) in this release (ordered by number of
GitHub commits): Tim Donohue (tdonohue), Michele Boychuk (Micheleboychuk), Michael Spalti (mspalti), Hardy Pottinger (hardyoyo), Yana De Pauw
(YanaDePauw), Joost Fock (joost-atmire), Kevin Van de Velde (KevinVdV), Corrado Lombardi (corrad82-4s), Jose Vicente Ribelles Aguilar (jvribell),
Davide Negretti (davidenegretti-4science), Luca Giamminonni (LucaGiamminonni), Yury Bondarenko (ybnd), Ben Bosman (benbosman), Marie Verdonck
(MarieVerdonck), Bruno Roemers (bruno-atmire), Mark Wood (mwoodiupui), Hrafn Malmquist (J4bbi), Paulo Graça (paulo-graca), Andrea Bollini (abollini),
Giuseppe Digilio (atarix83), William Welling (wwelling), Kristof De Langhe (Atmire-Kristof), Samuel Cambien (samuelcambien)
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 7.1 (after Oct 28, 2021) and 7.2: https://github
.com/DSpace/DSpace/graphs/contributors?from=2021-10-28&to=2022-02-03&type=c
DSpace 7.0 and 7.1 both used a bundled version of the Apache Log4j Library vulnerable to RCE (remote command execution). The CVE-2021-44228
vulnerability is described in more detail at https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228 and https://logging.apache.org/log4j/2.x
/security.html#Fixed_in_Log4j_2.15.0
DSpace 7.1.1 only contains an update to the Apache Log4j Library to ensure DSpace is not vulnerable to CVE-2021-44228. As such, it was only a
Backend / REST API release. The DSpace 7.1 Frontend (UI) can be used with the DSpace 7.1.1 Backend.
To ensure your 7.x site is completely secure, perform ALL the following:
1. Upgrade your DSpace backend (REST API) to version 7.1.1 immediately. This backend is compatible with the DSpace Frontend version 7.1
a. If you are unable to perform this upgrade, you may patch your 7.0 or 7.1 site by applying the changes in PR #8065. Specifically, update
your ./pom.xml to have <log4j.version>2.15.0</log4j.version>. Then rebuild & redeploy your backend. Make sure to restart Tomcat.
2. Upgrade to Apache Solr v8.11.1 (or above), to ensure your Solr is patched for CVE-2021-44228
a. If you are unable to perform this upgrade, you may patch your current Solr by ensuring that `-Dlog4j2.formatMsgNoLookups=true` is
specified in your `SOLR_OPTS` environment variable. For more information, see https://solr.apache.org/security.html#apache-solr-
affected-by-apache-log4j-cve-2021-44228
3. If you use the Handle.Net Registry Support in DSpace 7.x, make sure to restart your Handle Server. This will ensure it is using the new version of
log4j as well.
At this time, DSpace 6.x and below appear unaffected by CVE-2021-44228, as they all used log4j v1 exclusively with a default configuration that is not
impacted.
Immediately after version 7.1.1 was released, the log4j community announced a secondary, less severe vulnerability (CVE-2021-45046) which was
patched in a log4j v 2.16.0 release.
This fix is NOT included in 7.1.1. But, you can immediately apply this secondary patch by applying the changes in https://github.com/DSpace/DSpace/pull
/8070. This is again a one line change. Simply update your ./pom.xml to have <log4j.version>2.16.0</log4j.version>. Then rebuild & redeploy your
backend.
DSpace 7.1 contains a security fix to the backend (REST API) for all sites running 7.0. See CVE-2021-41189 for details.
DSpace 7.1 was released on November 1, 2021
To try out DSpace 7.1 immediately, see Try out DSpace 7. This includes instructions for a quick-install via Docker, as well as information on our sandbox
/demo site for DSpace 7.
To upgrade to DSpace 7.1 from 7.0 or any prior version, see Upgrading DSpace.
To install DSpace 7.0 for the first time, see Installing DSpace.
DSpace 7.1 provides new features, security & bug fixes to the 7.x platform.
Request a Copy (Backend donated by Mark H. Wood of IUPUI): Similar to v6.x, users can now ask the original author or submitter (or a help
desk email) for an emailed copy of access restricted files. This provides users with a way to privately get copies of restricted files, should the
request be approved. A request can be submitted by simply clicking on an access restricted bitstream in the UI. Approval or rejection of the
request occurs by clicking the link sent in the request email.
Item Versioning: Similar to v6.x, administrators or submitters can now create new versions of Items. A new Item version can be created by
logging in & clicking the "Create new version" button (next to the "Edit this item" button) on an Item's page. The new version is then created via
the normal Item submission form (prepopulated with all existing information). Once created, all versions of an Item are visible on the Item page in
the "Version History" section.
Item Versioning is enabled by default, but can be disabled via configuration.
Entities are not yet supported for Versioning.
21
Configure Collections to harvest content via OAI-PMH (OAI Harvesting): Similar to 6.x (XMLUI), on the "Edit Collection" page's "Content
Source" tab, there's an option to specify "This collection harvests its content from an external source". When enabled for a Collection, you can
configure an external OAI-PMH instance (including another DSpace site) to harvest from. Once configured in the UI, harvesting is completed
based on the configured schedule in your backend's local.cfg or oai.cfg.
IIIF Support (Donated by Michael Spalti of Willamette University, with support & enhancements donated by 4Science): DSpace now supports the
International Image Interoperability Framework (IIIF.io), including an embedded IIIF viewer (Mirador) in the UI. IIIF support is disabled by default,
but can be easily enabled via configuration. Enabling IIIF also requires installing a IIIF image server (e.g. Cantaloupe). For more details, please
see the linked documentation.
Ability to "extend" other User Interface themes: In your environment.prod.ts, you can now specify that one theme "extends" another. This
allows you to inherit all settings from the extended them by default. See the "Entending other Themes" section of the User Interface
Customization documentation.
Configure one Entity Type per Collection (Ported by 4Science from their DSpace-CRIS project): When Configurable Entities are enabled, in
the "Edit Collection" page you can select an Entity Type (e.g. Person, Project, Journal, etc) that Collection will accept. Once configured, this
Collection will only accept new Submissions of that Entity Type, and will be one of the recommended Collections to Submitters whenever they
start a new Submission of that Entity Type. See "Configure Collections for each Entity type" section of the Configurable Entities documentation.
Support for importing Entities & Relationships via the Simple Archive Format (Donated by tysonlt): This is achieved via a new, optional
"relationships" file in the Simple Archive Format directory. See the documentation for more details.
Support for importing Project Entities with funding information via the OpenAIRE API (Donated by Paulo Graça ): When importing a new
"Research Project" Entity, a new "Funding OpenAIRE API" option is available, allowing you to import a Project from the OpenAIRE API complete
with all it's funding information (Funder, Funder Identifier, Funding Stream and Funding ID). This is implemented via a new external source via Live
Import from external sources.
Command-line script to help test the connection between your UI and your REST API. Several people who installed 7.0 early ran into
issues configuring the UI and REST API properly. A new "yarn config:check:rest" script has been added to the frontend codebase to help validate
the connection with your REST API. It should also provide more descriptive errors (should they occur) which will help us to debug future issues
others may encounter. See the "Frontend Installation" instructions (step 4) in Installing DSpace for more details.
Logical Item Filtering and DOI Filtered Provider for DSpace. This introduces a framework to define rules using boolean logic to filter items.
Furthermore these filters can be used to decide whether a DOI should be minted for a certain item or not. Donated by The Library Code with the
support of TUHH and TIB.
[HIGH] CVE-2021-41189: In 7.0, a Community or Collection Admin could escalate their permissions to become a full Administrator. A quick fix is a
lso provided for sites running 7.0. (Reported by Andrea Bollini of 4Science)
German (Deutsch) user interface support was updated (Donated by The Library Code)
Spanish (Español) user interface support added (Donated by Gustavo S. Ferreyro)
7.1 Acknowledgments
A total of 27 unique individuals contributed to 7.1, with major institutional contributions coming from 4Science and Atmire.
The above contributor lists were determined based on contributions to the "dspace-angular" project in GitHub between 7.0 (after July 29, 2021) and 7.1: htt
ps://github.com/DSpace/dspace-angular/graphs/contributors?from=2021-07-29&to=2021-10-27&type=c
22
(tdonohue), Andrea Bollini (abollini), tysonlt, Yury Bondarenko (ybnd), Corrado Lombardi (corrad82-4s), Yana De Pauw (YanaDePauw), Nicholas
Woodward (nwoodward), Alan Orth (alanorth), Davide Negretti (davidenegretti-4science), Marie Verdonck (MarieVerdonck), Andrew Wood
(AndrewZWood), Ben Bosman (benbosman).
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 7.0 (after July 29, 2021) and 7.1: https://github
.com/DSpace/DSpace/graphs/contributors?from=2021-07-29&to=2021-10-27&type=c
DSpace 7.0 is the largest release in the history of DSpace software. While retaining the "out-of-the-box" aspects DSpace is known for, it represents a
major evolution of the platform including:
A completely new User Interface (demo site). This is the new Javascript-based frontend, built on Angular.io (with support for SEO provided by
Angular Universal). This new interface is also customizable via HTML and CSS (Sass) and Bootstrap. For early theme building tips see User
Interface Customization
A completely new, fully featured REST API (demo site), provided via a single "server" webapp backend. This new backend is not only a REST
API, but also still supports OAI-PMH, SWORD (v1 or v2) and RDF. Anything you can do from the User Interface is now also possible in our REST
API. See REST API documentation for more details.
A newly designed search box. Search from the header of any page (click the magnifying glass). The search results page now features
automatic search highlight, expandable & searchable filters, and optional thumbnail-based results (click on the “grid” view).
A new MyDSpace area to manage your submissions & reviews, MyDSpace includes a new drag & drop area to start a new submission, and
easily search your workflow tasks or in progress submissions to find what you were working on. (Login, click on your user profile icon, click
“MyDSpace”). Find workflow tasks to claim by selecting “All tasks” in the “Show” dropdown.
A new configurable submission user interface, featuring a one-page, drag & drop submission form. This form is completely configurable and
can be prepopulated by dragging & dropping a metadata file (e.g. ArXiv, CSV/TSV, Endnote, PubMed, or RIS. etc) or by importing via external
APIs (e.g ORCID, PubMed, Sherpa Journals or Sherpa Publishers, etc) (video). Local controlled vocabularies are also still supported (video). See
Submission User Interface for more details.
Optional, new Configurable Entities feature. DSpace now supports “entities”, which are DSpace Items of a specific ‘type’ which may have
relationships to other entities. These entity types and relationships are configurable, with two examples coming out-of-the-box: a set of Journal
hierarchy entities (Journal, Volume, Issue, Publication) and a set of Research entities (Publication, Project, Person, OrgUnit). For more
information see Configurable Entities.
Dynamic user interface translations (Click the globe, and select a language). Interested in adding more translations? See DSpace 7
Translation - Internationalization (i18n) - Localization (l10n).
A new Admin sidebar. Login as an Administrator, and an administrative sidebar appears. Features available include:
Quickly create or edit objects from anywhere in the system. Either browse to the object first, or search for it using the Admin sidebar.
Processes UI (video) allows Administrators to run backend scripts/processes while monitoring their progress & completion. (Login as an
Admin, select "Processes" in sidebar)
Administrative Search (video) combines retrieval of withdrawn items and private items, together with a series of quick action buttons.
Administer Active Workflows (video) allows Administrators to see every submission that is currently in the workflow approval process.
Bitstream Editing (video) has a drag-and-drop interface for re-ordering bitstreams and makes adding and editing bitstreams more
intuitive.
Metadata Editing (video) introduces suggest-as-you-type for field name selection of new metadata.
Login As (Impersonate) another account allows Administrators to debug issues that a specific user is seeing, or do some work on behalf
of that user. (Login as an Admin, Click "Access Control" in sidebar, Click "People". Search for the user account & edit it. Click the
"Impersonate EPerson" button. You will be authenticated as that user until you click "Stop Impersonating EPerson" in the upper right.)
Improved GDPR alignment (video)
User Agreement required for all authenticated users to read and agree to. (Login for first time, and sample user agreement will display.
After agreeing to it, it will not appear again.)
Cookie Preferences are now available for all users (anonymous or authenticated). A cookie preference popup appears when first
accessing the site. Users are given information on what cookies added by DSpace, including a Privacy Statement which can be used to
describe how their data is used.
User Accounts can be deleted even if they've submitted content in the past.
Support for OpenAIREv4 Guidelines for Literature Repositories in OAI-PMH (See the new “openaire4” context in OAI-PMH).
Search Engine Optimization: Tested and approved by the Google Scholar team, DSpace still includes all the SEO features you require: a robots.
txt, Sitemaps and Google Scholar "citation" tags.
Video/Image Content Streaming (Kindly donated by Zoltán Kanász-Nagy and Dániel Péter Sipos of Qulto): When enabled, DSpace can now
stream videos & view images full screen, using an embedded viewer. (See the "mediaViewer" settings in the environment.common.ts to enable.)
Basic Usage Statistics (video) are available for the entire site (See "Statistics" menu at top of homepage), or specific Communities, Collections
or Items (Click on that same "Statistics" menu after browsing to a specific object
Additional features are listed in the Beta release notes below. Also, give it a try on our demo site & see what you discover!
DSpace 7 does not yet include all the features of DSpace 6.x
DSpace 7.0 represents a major evolution of the platform into a new, modern web architecture. This means there are tons of new and redesigned features
in 7.0. However, in order to get this release in your hands sooner, DSpace Steering decided to delay some 6.x features for later 7.x releases. So, if you
don't see a 6.x feature yet in 7.0, it'll likely be coming soon in a later 7.x release. For a prioritized list of upcoming features see "What features are coming
in a later 7.x release?" on our DSpace Release 7.0 Status page.
Additional major changes to be aware of in the 7.x platform (not an exhaustive list):
XMLUI and JSPUI are no longer supported or distributed with DSpace. All users should immediately migrate to and utilize the new Angular
User Interface. There is no migration path from either the XMLUI or JSPUI to the new User interface. However, the new user interface can be
themed via HTML and CSS (SCSS).
23
The old REST API ("rest" webapp from DSpace v4.x-6.x) is deprecated and will be removed in v8.x. The new REST API (provided in the
"server" webapp) replaces all functionality available in the older REST API. If you have tools that rely on the old REST API, you can still
(optionally) build & deploy it alongside the "server" webapp via the "-Pdspace-rest" Maven flag. See REST API v6 (deprecated)
The Submission Form configuration has changed. The "item-submission.xml" file has changed its structure, and the "input-forms.xml" has
been replaced by a "submission-forms.xml". See Submission User Interface
ElasticSearch Usage Statistics have been removed. Please use SOLR Statistics or DSpace Google Analytics Statistics.
The traditional, 3-step Workflow system has been removed in favor of the Configurable Workflow System. For most users, you should see
no effect or difference. The default setup for this Configurable Workflow System is identical to the traditional, 3-step workflow ("Approve/Reject",
"Approve/Reject/Edit Metadata", "Edit Metadata")
The old BTE import framework in favor of Live Import Framework (features of BTE have been ported to Live Import)
Apache Solr is no longer embedded within the DSpace installer. Solr now MUST be installed as a separate dependency alongside the
DSpace backend. See Installing DSpace.
A large number of old/obsolete configurations were removed. "7.0 Configurations Removed" section below.
See Upgrading DSpace for more hints on the upgrade from any old version of DSpace to 7.x
Additional Resources
Video presentations / Workshops from OR2021 (June 2021) showing off many of the new features & configurations of DSpace 7: DSpace 7 at
OR2021
Within the [dspace]/config/ directory, these are the configuration files which were deleted:
dc2mods.cfg
input-forms.xml / dtd (REPLACED BY submission-forms.xml, see Submission User Interface)
log4j.properties (REPLACED BY log4j2.xml)
log4j-console.properties (REPLACED BY log4j-console.xml)
log4j-solr.properties (no replacement as Solr now must be installed separately)
news-side.html
news-top.html
news-xmlui.xml
workflow.xml (REPLACED BY ./spring/api/workflow.xml)
xmlui.xconf / dtd
emails/bte_* (BTE import framework was removed in favor of Live Import from external sources)
modules/controlpanel.cfg
modules/elastic-search-statistics.cfg (Elastic Search support was removed in favor of Solr)
modules/fetchccdata.cfg
modules/publication-lookup.cfg
spring/api/bte.xml (BTE import framework was removed in favor of Live Import from external sources)
spring/oai/* (OAI is now part of the backend "server webapp" and needs no separate configurations)
spring/xmlui/*
Within the dspace.cfg main configuration file, the following settings were removed:
log.init.config (replaced by log4j2.xml)
webui.submit.blocktheses
webui.submit.upload.html5
webui.submission.restrictstep.enableAdvancedForm
webui.submission.restrictstep.groups
webui.submit.enable-cc
webui.browse.thumbnail.*
webui.item.thumbnail.*
webui.preview.enabled
webui.strengths.show
webui.browse.author-field
webui.browse.author-limit
webui.browse.render-scientific-formulas
recent.submissions.*
webui.collectionhome.*
plugin.sequence.org.dspace.plugin.SiteHomeProcessor
plugin.sequence.org.dspace.plugin.CommunityHomeProcessor
plugin.sequence.org.dspace.plugin.CollectionHomeProcessor
plugin.sequence.org.dspace.plugin.ItemHomeProcessor
plugin.single.org.dspace.app.webui.search.SearchRequestProcessor
plugin.single.org.dspace.app.xmlui.aspect.administrative.mapper.SearchRequestProcessor
plugin.named.org.dspace.app.webui.json.JSONRequest
plugin.single.org.dspace.app.webui.util.StyleSelection
webui.bitstream.order.*
webui.itemdisplay.*
webui.resolver.*
webui.preferred.identifier
webui.identifier.*
webui.mydspace.*
webui.suggest.*
webui.controlledvocabulary.enable
webui.session.invalidate
itemmap.*
24
jspui.*
xmlui.*
mirage2.*
A full list of all changes / bug fixes in 7.x is available in the Changes in 7.x section.
7.0 Acknowledgments
DSpace 7.0 was the largest release in the history of DSpace, with 1,026,797 lines of code changed and 79 unique individuals contributing to either the
frontend or backend.
Financial Contributors
We gratefully recognize the following institutions who together have generously contributed financially to support the DSpace 7 staged release program
(see DSpace 7 Release Goals), and individuals who devoted time to fundraising:
Auburn University
Cornell University
Pascal Becker
Dalhousie University
Duke University
ETH Zurich, ETH Library
Fraunhofer Gesellschaft
Imperial College London
Indiana University–Purdue University, Indianapolis
LYRASIS
National Library of Finland
Beate Rajski
Staats- und Universitätsbibliothek Hamburg – Carl von Ossietzky
Technische Universität Berlin
Technische Universität Hamburg (TUHH)
The DSpace-Konsortium Deutschland
The Helmut-Schmidt-Universität/Universität der Bundeswehr Hamburg
The Library Code GmbH
The Ohio State University
Texas Digital Library
University of Arizona
University of Edinburgh
University of Kansas
University of Minnesota
University of Missouri
University of Toronto
World Bank
ZHAW
Out of the above list, the following individuals contributed a translation of the new interface (ordered alphabetically by language): Ivan Masar (Czech),
Marina Muilwijk (Dutch), Reeta Kuuskoski (Finnish), David Cavrenne (French), Claudia Jürgen and Sasha Szott (German), Nagy Akos and Transylvanian
Museum Society (Hungarian), Mikus Zarins (Latvian), Vítor Silvério Rodrigues and marciofoz (Brazilian Portuguese), José Carvalho (Portuguese) and
Maria Fernanda Ruiz (Spanish).
The above contributor lists were determined based on historical contributions to the "dspace-angular" project in GitHub until 7.0: https://github.com/DSpace
/dspace-angular/graphs/contributors?from=2016-11-27&to=2021-07-29&type=c
25
Backend / REST API Acknowledgments
The following 55 individuals have contributed directly to the DSpace backend (REST API, Java API, OAI-PMH, etc) in this release (ordered by number of
GitHub commits): Raf Ponsaerts (Raf-atmire), Tim Donohue (tdonohue), Andrea Bollini (abollini), Michele Boychuk (Micheleboychuk), Mark Wood
(mwoodiupui), Marie Verdonck (MarieVerdonck), Ben Bosman (benbosman), Luigi Andrea Pascarelli (lap82), Terry Brady (terrywbrady), Tom Desair
(tomdesair), Yana De Pauw (YanaDePauw), Chris Wilper (cwilper), Peter Nijs (peter-atmire), Kevin Van de Velde (KevinVdV), Bruno Roemers (bruno-
atmire), Giuseppe Digilio (atarix83), Pasquale Cavallo (pasqualecvl), Jelle Pelgrims (jpelgrims-atmire), Andrew Wood (AndrewZWood), Samuel Cambien
(samuelcambien), Antoine Snyers (antoine-atmire), Kim Shepherd (kshepherd), Yury Bondarenko (ybnd), Michael Spalti (mspalti), Alessandro Martelli
(alemarte), Oliver Goldschmidt (olli-gold), Jonas Van Goolen (jonas-atmire), Kristof De Langhe (Atmire-Kristof), Alexander Sulfrian (AlexanderS), Patrick
Trottier (PTrottier), Pablo Prieto (ppmdo), Hardy Pottinger (hardyoyo), Pascal-Nicolas Becker (pnbecker), William Tantzen (tantz001), Paulo Graça (paulo-
graca), Luca Giamminonni (LucaGiamminonni), Ivan Masar (helix84), Hrafn Malmquist (J4bbi), Ian Little (ilittle-cnri), Anis Moubarik (anis-moubarik),
Claudia Jürgen (cjuergen), Alan Orth (alanorth), xuejiangtao, Danilo Di Nuzzo (ddinuzzo), James Creel (jcreel), Marsa Haoua (marsaoua), Philip
Vissenaekens (PhilipVis), Miika Nurminen (minurmin), Bram Luyten (bram-atmire), Christian Scheible (christian-scheible), Nicholas Woodward
(nwoodward), József Marton (jmarton), Mohamed Mohideen Abdul Rasheed (mohideen), Saiful Amin (saiful-semantic), Àlex Magaz Graça (rivaldi8)
The above contributor list was determined based on contributions to the "DSpace" project in GitHub between 6.0 (after Oct 24, 2016) and 7.0: https://github
.com/DSpace/DSpace/graphs/contributors?from=2016-10-24&to=2021-07-29&type=c Therefore this list may include individuals who contributed to later 6.
x releases, but only if their bug fix was also applied to 7.0.
Additional Thanks
Additional thanks to our DSpace Leadership Group and DSpace Steering Group for their ongoing DSpace support and advice. Thanks also to LYRASIS
for your leadership, collaboration & support in helping to speed up the development process of DSpace 7.
Thanks also to the various developer & community Working Groups who have worked diligently to help make DSpace 7 a reality. These include:
DSpace 7 Working Group (2016-2023) - This is the team behind the code
DSpace 7 Entities Working Group (2018-19) - This team designed & implemented Configurable Entities
DSpace 7 Marketing Working Group (2016-2020) - This team did all our DSpace 7marketing, press releases & announcements.
DSpace Community Advisory Team (DCAT) - This team helped organize/lead the DSpace 7.0 Testathon (to bang on the system to find any last
bugs), and they also provided us with advice on features, etc.
We apologize to any contributor accidentally left off this list. DSpace has such a large, active development community that we sometimes lose track of all
our contributors. Acknowledgments to those left off will be made in future releases.
Included in Beta 5
Support for custom theme(s) in UI & accessibility cleanup of base theme. See early information at DSpace UI Design principles and
guidelines and the "themes" section of the environment.common.ts
Updated the "base" theme (default Bootstrap look & feel) for consistency and better accessibility. (Additional accessibility analysis will
be performed during Testathon)
Added a simple "dspace" theme (this is the new default theme, and primarily shows an example of customizing color scheme &
homepage)
Added a "custom" theme folder with all necessary files. These files can be directly modified to create a completely custom theme.
Major performance improvements to UI by making better use of caching & smart reloading
Video/Image Content Streaming (Kindly donated by Zoltán Kanász-Nagy and Dániel Péter Sipos of Qulto): When enabled, DSpace can now
stream videos & view images full screen, using an embedded viewer.
See the new "mediaViewer" settings in the environment.common.ts to enable. Sample screenshots of the feature can also be found at htt
ps://github.com/DSpace/dspace-angular/issues/885
New Administrative Features
Add ability to modify Community/Collection resource policies (i.e. permissions). Edit a Community or Collection and look at the
"Authorizations" tab.
Add ability to edit/delete user Groups.
Add private/withdrawn item badges for Administrators to quickly see which Items are private or withdrawn. These are viewable
throughout the browse/search when logged in as an Administrative user.
Configurable Entities Improvements
Entities now report their Entity type in the URL path (e.g. Person entities use URL path /entities/person/[uuid] and Publication entities
use the URL path /entities/publication/[uuid])
Each Entity type now has a custom Submission form.
These can be most easily seen in the Demo site. Submitting to the "People" collection uses the "Person" Entity
Form. Submitting to the "Articles" collection uses the "Publication" Entity Form. The full list of Entity-specific Collection
submission mappings can be found in the example in item-submission.xml (this example is enabled on our Demo Site)
General performance improvements for Entities. Introduction of "tilted" relationships for Configurable Entities that may have hundreds or
thousands of relationships.
Improvements to Upgrade process
Added a new Submission form migration script to help DSpace 5/6 institutions migrate their old Submission configuration files to the new
/updated format for v7.
Security fixes
26
Added CSRF (Cross-Site Request Forgery) protection to REST API. UI (and any other clients) now must be trusted to login to the REST
API.
Improved permissions checks/validation in UI for Administrator, Community/Collection Administrator and Submitter roles.
Fixed several other security issues auto-reported by LGTM
Many bug fixes
Fixed issue where mapped items were not appearing
Fixed issue where Handles were not redirecting
Fixed issues with Sherpa and ORCID integrations
Fixed several small issues with OpenAIRE v4 support in OAI-PMH
Fixed many bugs in MyDSpace and Submission UI
Fixed several bugs in CSV import/export process.
Fixes to search/browse pagination & breadcrumb trail
Improved performance of Browse by Community/Collection hierarchy
LDAP Authentication support is working again
Many dependency upgrades
Upgrade UI to Angular v10
Upgrade UI to Node v12 or v14 support
Upgrade Backend to Solr v8 support
Upgrade to ORCID v3 support
Upgrade to SHERPA v2 support
Removal of obsolete features
Removal of the old BTE framework in favor of Live Import Framework (features of BTE have been ported to Live Import)
Removal of Traditional/Basic workflow in favor of Configurable Workflow (default workflow is still the same as in DSpace 6)
Changelog
Included in Beta 4
Live Import framework (video) support has been added to the Submission Form (and REST API /api/integration/externalsources
endpoint)
Search an external site for works to import (From your MyDSpace page, click the "Import metadata from external source" button in upper
right). Currently supports Library of Congress Names, ORCID, PubMed, Sherpa Journals or Sherpa Publishers.
Drag and drop a bibliographic file into Submission form or MyDSpace page to prepopulate metadata. Supported formats include ArXiv,
CSV (or TSV), Endnote, PubMed, or RIS.
Controlled Vocabulary support (video) in Submission Form. Depending on the field configuration, this can include autocomplete of known terms
(see default "Subject Keywords" field), dropdown support (see default "Type" field) and hierarchical tree views
Includes support for Controlled Vocabs, Authority Control and "Value-Pairs" (from submission configs)
Curation Tasks are now supported via the Admin UI and the Processes UI. (Login as an Admin, select "Curation Tasks")
Import / Export metadata from/to CSV (i.e. Batch Metadata Editing) is now available from the Admin UI. (Login as an Admin, select "Export" >
"Metadata", select "Import" > "Metadata")
Basic Usage Statistics (video) are available for the entire site (See "Statistics" menu at top of homepage), or specific Communities, Collections
or Items (Click on that same "Statistics" menu after browsing to a specific object).
Support for exchanging usage data to IRUS was added. See new "irus-statistics.cfg" and DS-626
Improved GDPR Alignment (video)
User Agreement required for all authenticated users to read and agree to. (Login for first time, and sample user agreement will display.
After agreeing to it, it will not appear again.)
Cookie Preferences are now available for all users (anonymous or authenticated). A cookie preference popup appears when first
accessing the site. Users are given information on what cookies added by DSpace, including a Privacy Statement which can be used to
describe how their data is used.
User Accounts can be deleted even if they've submitted content in the past.
When a user is deleted, their past submissions are kept but the submitter field is set to empty (null).
Users cannot be deleted if they are the only member of a workflow approval group. Admins must either delete that group first,
or assign another member to the group. This ensures Workflows are kept even if a user account needs to be deleted.
Language preferences are now kept for all users (anonymous or logged in). By default, DSpace will try to use your browser's preferred language
(if found in Accept-Language header and a translation in that language exists). Users can override it by either saving a preferred language in
their user profile, or by manually selecting a different language from the globe icon (upper right).
IP-based authorization lets you restrict (or provide access to) objects based on the user's IP address. This uses the same "authentication-ip.
cfg" configuration as DSpace 6, allowing you to map IP ranges to specific DSpace Groups. Users within that IP range are added to the mapped
DSpace Group for the remainder of their session.
Search Engine Optimization: Addition of robots.txt, Sitemaps and Google Scholar "citation" tags. These optimizations are being tested by the
Google Scholar team and may be improved further in the upcoming beta 5 release.
For improved SEO, Sitemaps are now enabled by default and automatically update once per day.
Security Fixes and Dependency upgrades
Enhancements to new /api/authz/features endpoint in REST API to provide additional feature-specific permission checks
Flyway database engine was upgraded to version 6.5.5
Indexing enhancements (some objects were being indexed twice, see PR#2960)
Fixes to Shibboleth login
Additional bug fixes to both UI and REST API
Changelog
27
All Backend changes: https://github.com/DSpace/DSpace/issues?q=is%3Aclosed+milestone%3A7.0beta4
Included in Beta 3
Processes Admin UI (video) allows Administrators to run backend scripts/processes while monitoring their progress & completion. (Login as an
Admin, select "Processes" in sidebar)
Currently supported processes include "index-discovery" (reindex site), "metadata-export" (batch metadata editing CSV export), and
"metadata-import" (batch metadata editing CSV import).
Manage Account Profile allows logged in users to update their name, language or password. (Login, click on the account icon, and select
"Profile")
New User Registration (video) and password reset on the Login Screen
Login As (Impersonate) another account allows Administrators to debug issues that a specific user is seeing, or do some work on behalf of that
user. (Login as an Admin, Click "Access Control" in sidebar, Click "People". Search for the user account & edit it. Click the "Impersonate
EPerson" button. You will be authenticated as that user until you click "Stop Impersonating EPerson" in the upper right.)
Requires "webui.user.assumelogin=true" to be set in your local.cfg on backend. Also be aware that you can only "impersonate" a user
who is not a member of the Administrator group.
Manage Authorization Policies of an Item allows Administrators to directly change/update the access policies of an Item, its Bundles or
Bitstreams. (Login as an Admin, Click "Edit" "Item" in sidebar, and search for the Item. Click the "Authorization.." button on its "Status" tab.
Manage Item Templates of a Collection allows Administrators to create/manage template metadata that all new Items will start with when
submitted to that Collection. (Login as an Admin, Click "Edit" "Collection" in sidebar and search for the Collection. Click the "Add" button under
"Template Item" to get started.)
NOTE: unfortunately there's a known bug that while you can create these templates, the submission process is not yet using them. See h
ttps://github.com/DSpace/dspace-angular/issues/748
Administer Active Workflows (video) allows Administrators to see every submission that is currently in the workflow approval process. From
there, they have the option to delete Items (if they are no longer needed), or send them back to the workflow pool (to allow another user to review
them). (Login as an Admin, Click "Administer Workflow" in sidebar)
CC License step allows your users to select a Creative Commons License as part of their submission. Once enabled in the "item-submission.
xml" (on the backend) it appears as part of the submission form.
Angular CLI compatibility was added to the User Interface. This allows developers to easily update the User Interface using standard Angular
commandline tools. More information (including tutorials) is available at https://cli.angular.io/
English, Latvian, Dutch, German, French, Portuguese, Spanish and Finnish language catalogs
Numerous bugs were fixed based on early user testing. (Thanks to all who've tested Beta 1 or Beta 2 and reported your feedback!) Some bugs
fixed include:
Login/Logout session fixes (including compatibility with Firefox and Safari browsers)
Improved Community/Collection tree browsing performance
Fixes to editing Communities, Collections and Items. This includes improved drag & drop reordering of bitstreams in an Item.
Improved performance of Collection dropdown in submission
Ability to download restricted bitstreams (previously these would error out)
Authorization & security improvements in both REST API and UI
Upgraded all REST API dependencies (Spring, Spring Boot, HAL Browser) and enhanced our automated testing via additional Integration Tests.
All features previous mentioned in 7.0 Beta 2 Release Notes and 7.0 Beta 1 Release Notes below
Learn More: New videos are available highlighting features of the MyDSpace area:
Changelog
Included in Beta 2
Administrative Search (video) combines retrieval of withdrawn items and private items, together with a series of quick action buttons.
EPeople, Groups and Roles can now be viewed, created and updated.
Manage Groups (Login as an Admin Access Control Groups)
Manage EPeople (Login as an Admin Access Control EPeople)
Manage Community/Collection Roles (Login as an Admin Edit Community/Collection Assign Roles). Note: this feature is Admin-only in
beta 2, but will be extended to Community/Collection Admins in beta 3.
Bitstream Editing (video) has a drag-and-drop interface for re-ordering bitstreams and makes adding and editing bitstreams more intuitive.
Metadata Editing (video) introduces suggest-as-you-type for field name selection of new metadata.
Update Profile / Change Password (Login Select user menu in upper right Profile)
Shibboleth Authentication
Viewing Item Version History (requires upgrading from a 6.x site that includes Item Versioning)
Collection and Community (video) creation and edit pages.
English, Latvian, Dutch, German, French, Portuguese and Spanish language catalogs
Security and authorization improvements, including REST API support hiding specific metadata fields (metadata.hide property) and upgrades of
different software packages on which DSpace 7 depends.
All features previous mentioned in 7.0 Beta 1 Release Notes below
28
A full list of all changes / bug fixes in 7.x is available in the Changes in 7.x section.
A completely new User Interface (demo site). This is the new Javascript-based frontend, built on Angular.io (with support for SEO provided by
Angular Universal). This new interface is also via HTML and CSS (SCSS). For early theme building training, see the “Getting Started with DSpace
7 Workshop” from the North American User Group meeting: slides or video recording.
A completely new, fully featured REST API (demo site), provided via a single "server" webapp backend. This new backend is not only a REST
API, but also still supports OAI-PMH, SWORD (v1 or v2) and RDF. See the REST API's documentation / contract at https://github.com/DSpace
/Rest7Contract/blob/master/README.md
A newly designed search box. Search from the header of any page (click the magnifying glass). The search results page now features
automatic search highlight, expandable & searchable filters, and optional thumbnail-based results (click on the “grid” view).
A new MyDSpace area, including a new, one-page, drag & drop submission form, a new workflow approval process, and searchable past
submissions. (Login, click on your user profile icon, click “MyDSpace”). Find workflow tasks to claim by selecting “All tasks” in the “Show”
dropdown.
Dynamic user interface translations (Click the globe, and select a language). Anyone interested in adding more translations? See DSpace 7
Translation - Internationalization (i18n) - Localization (l10n).
A new Admin sidebar. Login as an Administrator, and an administrative sidebar appears. Use this to create a new Community/Collection/Item,
edit existing ones, and manage registries. (NOTE: A number of Administrative tools are still missing or greyed out. They will be coming in future
Beta releases.)
Optional, new Configurable Entities feature. DSpace now supports “entities”, which are DSpace Items of a specific ‘type’ which may have
relationships to other entities. These entity types and relationships are configurable, with two examples coming out-of-the-box: a set of Journal
hierarchy entities (Journal, Volume, Issue, Publication) and a set of Research entities (Publication, Project, Person, OrgUnit). For more
information see “The Power of Configurable Entities” from OR2019: slides or video recording. Additionally, a test data set featuring both out-of-the-
box examples can be used when trying out DSpace 7 via Docker. Early documentation is available at Configurable Entities.
Support for OpenAIREv4 Guidelines for Literature Repositories in OAI-PMH (See the new “openaire4” context in OAI-PMH).
Additional major changes to be aware of in the 7.x platform (not an exhaustive list):
XMLUI and JSPUI are no longer supported or distributed with DSpace. All users should immediately migrate to and utilize the new Angular
User Interface. There is no migration path from either the XMLUI or JSPUI to the new User interface. However, the new user interface can be
themed via HTML and CSS (SCSS).
The old REST API ("rest" webapp from DSpace v4.x-6.x) is deprecated and will be removed in v8.x. The new REST API (provided in the
"server" webapp) replaces all functionality available in the older REST API. If you have tools that rely on the old REST API, you can still
(optionally) build & deploy it alongside the "server" webapp via the "-Pdspace-rest" Maven flag.
The Submission Form configuration has changed. The "item-submission.xml" file has changed its structure, and the "input-forms.xml" has
been replaced by a "submission-forms.xml". For early documentation see Configuration changes in the submission process
ElasticSearch Usage Statistics have been removed. Please use SOLR Statistics or DSpace Google Analytics Statistics.
The traditional, 3-step Workflow system has been removed in favor of the Configurable Workflow System. For most users, you should see
no effect or difference. The default setup for this Configurable Workflow System is identical to the traditional, 3-step workflow ("Approve/Reject",
"Approve/Reject/Edit Metadata", "Edit Metadata")
Apache Solr is no longer embedded within the DSpace installer (and has been upgraded to Solr v7). Solr now MUST be installed as a
separate dependency alongside the DSpace backend. See Installing DSpace.
Some command-line tools/scripts are enabled in the new REST API (e.g. index-discovery): See new Scripts endpoint: https://github.com
/DSpace/Rest7Contract/blob/master/scripts-endpoint.md
DSpace now has a single, backend "server" webapp to deploy in Tomcat (or similar). In DSpace 6.x and below, different machine interfaces
(OAI-PMH, SWORD v1 or v2, RDF, REST API) were provided via separate deployable webapps. Now, all those interfaces along with the new
REST API are in a single, "server" webapp built on Spring Boot. You can now control which interfaces are enabled, and what path they appear
on via configuration (e.g. "oai.enabled=true" and "oai.path=oai"). See https://jira.lyrasis.org/browse/DS-4257
Configuration has been upgraded to Apache Commons Configuration version 2. For most users, you should see no effect or difference. No
DSpace configuration files were modified during this upgrade and no configurations or settings were renamed or changed. However, if you locally
modified or customized the [dspace]/config/config-definition.xml (DSpace's Apache Commons Configuration settings), you will need
to ensure those modifications are compatible with Apache Commons Configuration version 2. See the Apache Commons Configuration's configur
ation definition file reference for more details.
Handle Server has been upgraded to version 9.x : https://jira.lyrasis.org/browse/DS-4205
DSpace now has sample Docker images (configurations) which can be used to try out DSpace quickly. See Try out DSpace 7 ("Install via
Docker" section).
29
Functional Overview
The following sections describe the various functional aspects of the DSpace system.
30
Full-text search
DSpace can process uploaded text based contents for full-text searching. This means that not only the metadata you provide for a given file will be
searchable, but all of its contents will be indexed as well. This allows users to search for specific keywords that only appear in the actual content and not in
the provided description.
Navigation
DSpace allows users to find their way to relevant content in a number of ways, including:
Another important mechanism for discovery in DSpace is the browse. This is the process whereby the user views a particular index, such as the title index,
and navigates around it in search of interesting items. The browse subsystem provides a simple API for achieving this by allowing a caller to specify an
index, and a subsection of that index. The browse subsystem then discloses the portion of the index of interest. Indices that may be browsed are item title,
item issue date, item author, and subject terms. Additionally, the browse can be limited to items within a particular collection or community.
Files that have been uploaded to DSpace are often referred to as "Bitstreams". The reason for this is mainly historic and tracks back to the technical
implementation. After ingestion, files in DSpace are stored on the file system as a stream of bits without the file extension.
By default, DSpace only recognizes specific file types, as defined in its Bitstream Format Registry. The default Bitstream Format Registry recognizes
many common file formats, but it can be enhanced at your local institution via the Admin User Interface.
OpenURL Support
DSpace supports the OpenURL protocol in a rather simple fashion. If your institution has an SFX server, DSpace will display an OpenURL link on every
item page, automatically using the Dublin Core metadata. Additionally, DSpace can respond to incoming OpenURLs. Presently it simply passes the
information in the OpenURL to the search subsystem. A list of results is then displayed, which usually gives the relevant item (if it is in DSpace) at the top
of the list.
31
The DSpace developer community aims to rely on modern web standards and well tested libraries where possible. As a rule of thumb, users can expect
that the DSpace web interfaces work on modern web browsers. DSpace developers routinely test new interface developments on recent versions of
Firefox, Safari, Chrome and Microsoft Edge. Because of fast moving, automatic, incremental updates to these browsers, support is no longer targeted at
specific versions of these browsers. (Please note that we do not recommend or support using Internet Explorer as it is considered "end of life" by
Microsoft.)
Metadata Management
Metadata
Broadly speaking, DSpace holds three sorts of metadata about archived content:
Descriptive Metadata: DSpace can support multiple flat metadata schemas for describing an item. A qualified Dublin Core metadata schema
loosely based on the Library Application Profile set of elements and qualifiers is provided by default. This default schema is described in more
detail in Metadata and Bitstream Format Registries. However, you can configure multiple schemas and select metadata fields from a mix of
configured schemas to describe your items. Other descriptive metadata about items (e.g. metadata described in a hierarchical schema) may be
held in serialized bitstreams.
Administrative Metadata: This includes preservation metadata, provenance and authorization policy data. Most of this is held within DSpace's
relational DBMS schema. Provenance metadata (prose) is stored in Dublin Core records. Additionally, some other administrative metadata (for
example, bitstream byte sizes and MIME types) is replicated in Dublin Core records so that it is easily accessible outside of DSpace.
Structural Metadata: This includes information about how to present an item, or bitstreams within an item, to an end-user, and the relationships
between constituent parts of the item. As an example, consider a thesis consisting of a number of TIFF images, each depicting a single page of
the thesis. Structural metadata would include the fact that each image is a single page, and the ordering of the TIFF images/pages. Structural
metadata in DSpace is currently fairly basic; within an item, bitstreams can be arranged into separate bundles as described above. A bundle may
also optionally have a primary bitstream. This is currently used by the HTML support to indicate which bitstream in the bundle is the first HTML file
to send to a browser. In addition to some basic technical metadata, a bitstream also has a 'sequence ID' that uniquely identifies it within an item.
This is used to produce a 'persistent' bitstream identifier for each bitstream. Additional structural metadata can be stored in serialized bitstreams,
but DSpace does not currently understand this natively.
Definitions
Choice Management
This is a mechanism that generates a list of choices for a value to be entered in a given metadata field. Depending on your implementation, the exact
choice list might be determined by a proposed value or query, or it could be a fixed list that is the same for every query. It may also be closed (limited to
choices produced internally) or open, allowing the user-supplied query to be included as a choice.
Authority Control
This works in addition to choice management to supply an authority key along with the chosen value, which is also assigned to the Item's metadata field
entry. Any authority-controlled field is also inherently choice-controlled.
1. There is a simple and positive way to test whether two values are identical, by comparing authority keys.
Comparing plain text values can give false positive results e.g. when two different people have a name that is written the same.
It can also give false negative results when the same name is written different ways, e.g. "J. Smith" vs. "John Smith".
2. Help in entering correct metadata values. The submission and admin UIs may call on the authority to check a proposed value and list possible
matches to help the user select one.
3. Improved interoperability. By sharing a name authority with another application, your DSpace can interoperate more cleanly with other
applications.
For example, a DSpace institutional repository sharing a naming authority with the campus social network would let the social network
construct a list of all DSpace Items matching the shared author identifier, rather than by error-prone name matching.
When the name authority is shared with a campus directory, DSpace can look up the email address of an author to send automatic email
about works of theirs submitted by a third party. That author does not have to be an EPerson.
4. Authority keys are normally invisible in the public web UIs. They are only seen by administrators editing metadata. The value of an authority key is
not expected to be meaningful to an end-user or site visitor.
Authority control is different from the controlled vocabulary of keywords already implemented in the submission UI:
1. Authorities are external to DSpace. The source of authority control is typically an external database or network resource.
Plug-in architecture makes it easy to integrate new authorities without modifying any core code.
2. This authority proposal impacts all phases of metadata management.
The keyword vocabularies are only for the submission UI.
32
2.
Authority control is asserted everywhere metadata values are changed, including unattended/batch submission, SWORD package
submission, and the administrative UI.
Some Terminology
Authority An authority is a source of fixed values for a given domain, each unique value identified by a key.
Authority The information associated with one of the values in an authority; may include alternate spellings and equivalent forms of the value,
Record etc.
Authority Key An opaque, hopefully persistent, identifier corresponding to exactly one record in the authority.
Licensing
DSpace offers support for licenses on different levels
Handles
Researchers require a stable point of reference for their works. The simple evolution from sharing of citations to emailing of URLs broke when Web users
learned that sites can disappear or be reconfigured without notice, and that their bookmark files containing critical links to research results couldn't be
trusted in the long term. To help solve this problem, a core DSpace feature is the creation of a persistent identifier for every item, collection and community
stored in DSpace. To persist identifiers, DSpace requires a storage- and location- independent mechanism for creating and maintaining identifiers. DSpace
uses the CNRI Handle System for creating these identifiers. The rest of this section assumes a basic familiarity with the Handle system.
DSpace uses Handles primarily as a means of assigning globally unique identifiers to objects. Each site running DSpace needs to obtain a unique Handle
'prefix' from CNRI, so we know that if we create identifiers with that prefix, they won't clash with identifiers created elsewhere.
Presently, Handles are assigned to communities, collections, and items. Bundles and bitstreams are not assigned Handles, since over time, the way in
which an item is encoded as bits may change, in order to allow access with future technologies and devices. Older versions may be moved to off-line
storage as a new standard becomes de facto. Since it's usually the item that is being preserved, rather than the particular bit encoding, it only makes
sense to persistently identify and allow access to the item, and allow users to access the appropriate bit encoding from there.
Of course, it may be that a particular bit encoding of a file is explicitly being preserved; in this case, the bitstream could be the only one in the item, and the
item's Handle would then essentially refer just to that bitstream. The same bitstream can also be included in other items, and thus would be citable as part
of a greater item, or individually.
The Handle system also features a global resolution infrastructure; that is, an end-user can enter a Handle into any service (e.g. Web page) that can
resolve Handles, and the end-user will be directed to the object (in the case of DSpace, community, collection or item) identified by that Handle. In order to
take advantage of this feature of the Handle system, a DSpace site must also run a 'Handle server' that can accept and resolve incoming resolution
requests. All the code for this is included in the DSpace source code bundle.
hdl:1721.123/4567
http://hdl.handle.net/1721.123/4567
33
The above represent the same Handle. The first is possibly more convenient to use only as an identifier; however, by using the second form, any Web
browser becomes capable of resolving Handles. An end-user need only access this form of the Handle as they would any other URL. It is possible to
enable some browsers to resolve the first form of Handle as if they were standard URLs using CNRI's Handle Resolver plug-in, but since the first form can
always be simply derived from the second, DSpace displays Handles in the second form, so that it is more useful for end-users.
It is important to note that DSpace uses the CNRI Handle infrastructure only at the 'site' level. For example, in the above example, the DSpace site has
been assigned the prefix '1721.123'. It is still the responsibility of the DSpace site to maintain the association between a full Handle (including the '4567'
local part) and the community, collection or item in question.
Each bitstream has a sequence ID, unique within an item. This sequence ID is used to create a persistent ID, of the form:
For example:
https://dspace.myu.edu/bitstream/123.456/789/24/foo.html
The above refers to the bitstream with sequence ID 24 in the item with the Handle hdl:123.456/789. The foo.html is really just there as a hint to browsers:
Although DSpace will provide the appropriate MIME type, some browsers only function correctly if the file has an expected extension.
The batch item importer is an application, which turns an external SIP (an XML metadata document with some content files) into an "in progress
submission" object. The Web submission UI is similarly used by an end-user to assemble an "in progress submission" object.
Depending on the policy of the collection to which the submission in targeted, a workflow process may be started. This typically allows one or more human
reviewers or 'gatekeepers' to check over the submission and ensure it is suitable for inclusion in the collection.
When the Batch Ingester or Submission UI completes the InProgressSubmission object, and invokes the next stage of ingest (be that workflow or item
installation), a provenance message is added to the Dublin Core which includes the filenames and checksums of the content of the submission. Likewise,
each time a workflow changes state (e.g. a reviewer accepts the submission), a similar provenance statement is added. This allows us to track how the
item has changed since a user submitted it.
Once any workflow process is successfully and positively completed, the InProgressSubmission object is consumed by an "item installer", that converts the
InProgressSubmission into a fully blown archived item in DSpace. The item installer:
34
Adds an issue date if none already present
Adds a provenance message (including bitstream checksums)
Assigns a Handle persistent identifier
Adds the item to the target collection, and adds appropriate authorization policies
Adds the new item to the search and browse index
Workflow Steps
By default, a collection's workflow may have up to three steps. Each collection may have an associated e-person group for performing each step; if no
group is associated with a certain step, that step is skipped. If a collection has no e-person groups associated with any step, submissions to that collection
are installed straight into the main archive. Keep in mind, however, that this is only the default behavior, and the workflow process can be configured
/customized easily, see Configurable Workflow.
In other words, the default sequence is this: The collection receives a submission. If the collection has a group assigned for workflow step 1, that step is
invoked, and the group is notified. Otherwise, workflow step 1 is skipped. Likewise, workflow steps 2 and 3 are performed if and only if the collection has a
group assigned to those steps.
When a step is invoked, the submission is put into the 'task pool' of the step's associated group. One member of that group takes the task from the pool,
and it is then removed from the task pool, to avoid the situation where several people in the group may be performing the same task without realizing it.
The member of the group who has taken the task from the pool may then perform one of three actions:
edit (step 2) Can edit metadata provided by the user with the submission, but cannot change the submitted files. Can accept submission for
inclusion, or reject submission.
finaledit Can edit metadata provided by the user with the submission, but cannot change the submitted files. Must then commit to archive; may
(step 3) not reject submission.
If a submission is 'accepted', it is passed to the next step in the workflow. If there are no more workflow steps with associated groups, the submission is
installed in the main archive.
One last possibility is that a workflow can be 'aborted' by a DSpace site administrator. This is accomplished using the Administration UI.
DSpace also includes various package importer tools, which support many common content packaging formats like METS. For more information see Packa
ge Importer and Exporter. Additionally, DSpace can import/export Archival Information Packages (AIPs), see AIP Backup and Restore.
35
Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by taking advantage of the bitstreams already
being in accessible computer storage. An example might be that there is a repository for existing digital assets. Rather than using the normal interactive
ingest process or the batch import to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the metadata and the location
of the bitstreams. DSpace uses a variation of the import tool to accomplish registration.
SWORD Support
SWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of items into repositories. SWORD was further
developed in SWORD version 2 to add the ability to retrieve, update, or delete deposits. DSpace supports the SWORD protocol via the 'sword' web
application and SWord v2 via the swordv2 web application. The specification and further information can be found at http://swordapp.org. See also SWOR
Dv1 Server and SWORDv2 Server.
OAI Support
The Open Archives Initiative has developed a protocol for metadata harvesting. This allows sites to programmatically retrieve or 'harvest' the metadata
from several sources, and offer services using that metadata, such as indexing or linking services. Such a service could allow users to access information
from a large number of sites from one place.
DSpace exposes the Dublin Core metadata for items that are publicly (anonymously) accessible. Additionally, the collection structure is also exposed via
the OAI protocol's 'sets' mechanism. OCLC's open source OAICat framework is used to provide this functionality.
You can also configure the OAI service to make use of any crosswalk plugin to offer additional metadata formats, such as MODS.
DSpace's OAI service does support the exposing of deletion information for withdrawn items, but not for items that are 'expunged' (see above). DSpace
also supports OAI-PMH resumption tokens. See OAI for more information.
Signposting
DSpace supports FAIR Signposting Profile at Level 2: By supporting the FAIR Signposting Profile at Level 2, your platform demonstrates a commitment to
improving the machine accessibility, interoperability, and reusability of scholarly resources. It ensures that the information you provide is standardized,
consistent, and easily navigable by both human users and machine agents, contributing to a more efficient and FAIR scholarly web ecosystem. For more
information see Signposting.
DSpace also includes various package exporter tools, which support many common content packaging formats like METS. For more information see Packa
ge Importer and Exporter. Additionally, DSpace can import/export Archival Information Packages (AIPs), see AIP Backup and Restore.
Packager Plugins
Packagers are software modules that translate between DSpace Item objects and a self-contained external representation, or "package". A Package
Ingester interprets, or ingests, the package and creates an Item. A Package Disseminator writes out the contents of an Item in the package format.
A package is typically an archive file such as a Zip or "tar" file, including a manifest document which contains metadata and a description of the package
contents. The IMS Content Package is a typical packaging standard. A package might also be a single document or media file that contains its own
metadata, such as a PDF document with embedded descriptive metadata.
Package ingesters and package disseminators are each a type of named plugin (see Plugin Manager), so it is easy to add new packagers specific to the
needs of your site. You do not have to supply both an ingester and disseminator for each format; it is perfectly acceptable to just implement one of them.
Most packager plugins call upon Crosswalk Plugins to translate the metadata between DSpace's object model and the package format.
More information about calling Packagers to ingest or disseminate content can be found in the Package Importer and Exporter section of the System
Administration documentation.
Crosswalk Plugins
Crosswalks are software modules that translate between DSpace object metadata and a specific external representation. An Ingestion Crosswalk interprets
the external format and crosswalks it to DSpace's internal data structure, while a Dissemination Crosswalk does the opposite.
For example, a MODS ingestion crosswalk translates descriptive metadata from the MODS format to the metadata fields on a DSpace Item. A MODS
dissemination crosswalk generates a MODS document from the metadata on a DSpace Item.
Crosswalk plugins are named plugins (see Plugin Manager), so it is easy to add new crosswalks. You do not have to supply both an ingester and
disseminator for each format; it is perfectly acceptable to just implement one of them.
36
There is also a special pair of crosswalk plugins which use XSL stylesheets to translate the external metadata to or from an internal DSpace format. You
can add and modify XSLT crosswalks simply by editing the DSpace configuration and the stylesheets, which are stored in files in the DSpace installation
directory.
The Packager plugins and OAH-PMH server make use of crosswalk plugins.
Once the default set has been applied, a system administrator may modify them as they would any other policy set in DSpace
This functionality could also be used in situations where researchers wish to collaborate on a particular submission, although there is no particular
collaborative workspace functionality.
User Management
Although many of DSpace's functions such as document discovery and retrieval can be used anonymously, some features (and perhaps some documents)
are only available to certain "privileged" users. E-People and Groups are the way DSpace identifies application users for the purpose of granting privileges.
This identity is bound to a session of a DSpace application such as the Web UI or one of the command-line batch programs. Both E-People and Groups
are granted privileges by the authorization system described below.
E-mail address
First and last names
Whether the user is able to log in to the system via the Web UI, and whether they must use an X509 certificate to do so;
A password (encrypted), if appropriate
A list of collections for which the e-person wishes to be notified of new items
Whether the e-person 'self-registered' with the system; that is, whether the system created the e-person record automatically as a result of the
end-user independently registering with the system, as opposed to the e-person record being generated from the institution's personnel database,
for example.
The network ID for the corresponding LDAP record, if LDAP authentication is used for this E-Person.
Subscriptions
Not yet been implemented. This listed in Tier 3 (see #6 in Tier 3): DSpace Release 7.0 Status#Tier3:MediumPriority
As noted above, end-users (e-people) may 'subscribe' to collections in order to be alerted when new items appear in those collections. Each day, end-
users who are subscribed to one or more collections will receive an e-mail giving brief details of all new items that appeared in any of those collections the
previous day. If no new items appeared in any of the subscribed collections, no e-mail is sent. Users can unsubscribe themselves at any time. RSS feeds
of new items are also available for collections and communities.
Groups
Groups are another kind of entity that can be granted permissions in the authorization system. A group is usually an explicit list of E-People; anyone
identified as one of those E-People also gains the privileges granted to the group.
However, an application session can be assigned membership in a group without being identified as an E-Person. For example, some sites use this
feature to identify users of a local network so they can read restricted materials not open to the whole world. Sessions originating from the local network
are given membership in the "LocalUsers" group and gain the corresponding privileges.
Administrators can also use groups as "roles" to manage the granting of privileges more efficiently.
Access Control
Authentication
Authentication is when an application session positively identifies itself as belonging to an E-Person and/or Group. In DSpace, it is implemented by a
mechanism called Stackable Authentication: the DSpace configuration declares a "stack" of authentication methods. An application (like the Web UI) calls
on the Authentication Manager, which tries each of these methods in turn to identify the E-Person to which the session belongs, as well as any extra
37
Groups. The E-Person authentication methods are tried in turn until one succeeds. Every authenticator in the stack is given a chance to assign extra
Groups. This mechanism offers the following advantages:
Separates authentication from the Web user interface so the same authentication methods are used for other applications such as non-interactive
Web Services
Improved modularity: The authentication methods are all independent of each other. Custom authentication methods can be "stacked" on top of
the default DSpace username/password method.
Cleaner support for "implicit" authentication where username is found in the environment of a Web request, e.g. in an X.509 client certificate.
Authorization
DSpace's authorization system is based on associating actions with objects and the lists of EPeople who can perform them. The associations are called
Resource Policies, and the lists of EPeople are called Groups. There are two built-in groups: 'Administrators', who can do anything in a site, and
'Anonymous', which is a list that contains all users. Assigning a policy for an action on an object to anonymous means giving everyone permission to do
that action. (For example, most objects in DSpace sites have a policy of 'anonymous' READ.) Permissions must be explicit - lack of an explicit permission
results in the default policy of 'deny'. Permissions also do not 'commute'; for example, if an e-person has READ permission on an item, they might not
necessarily have READ permission on the bundles and bitstreams in that item. Currently Collections, Communities and Items are discoverable in the
browse and search systems regardless of READ authorization.
Collection
DEFAULT_BITSTR inherited as READ by Bitstreams of all submitted items. Note: only affects Bitstreams of an item at the time it is initially submitted.
EAM_READ If a Bitstream is added later, it does not get the same default read policy.
COLLECTION_AD collection admins can edit items in a collection, withdraw items, map other items into this collection.
MIN
Item
Bundle
Bitstream
Note that there is no 'DELETE' action. In order to 'delete' an object (e.g. an item) from the archive, one must have REMOVE permission on all objects (in
this case, collection) that contain it. The 'orphaned' item is automatically deleted.
Usage Metrics
DSpace is equipped with SOLR based infrastructure to log and display pageviews and file downloads.
38
*File Downloads information is only displayed for item-level statistics. Note that downloads from separate bitstreams are also recorded and represented
separately. DSpace is able to capture and store File Download information, even when the bitstream was downloaded from a direct link on an external
website.
System Statistics
Various statistical reports about the contents and use of your system can be automatically generated by the system. These are generated by analyzing
DSpace's log files. Statistics can be broken down monthly.
Digital Preservation
Checksum Checker
The purpose of the checker is to verify that the content in a DSpace repository has not become corrupted or been tampered with. The functionality can be
invoked on an ad-hoc basis from the command line, or configured via cron or similar. Options exist to support large repositories that cannot be entirely
checked in one run of the tool. The tool is extensible to new reporting and checking priority approaches.
System Design
Data Model
39
Data Model Diagram
The way data is organized in DSpace is intended to reflect the structure of the organization using the DSpace system. Each DSpace site is divided into com
munities, which can be further divided into sub-communities reflecting the typical university structure of college, department, research center, or laboratory.
Communities contain collections, which are groupings of related content. A collection may only appear in one community at this time.
Each collection is composed of items, which are the basic archival elements of the archive. Each item is owned by one collection. Additionally, an item may
appear in additional collections; however every item has one and only one owning collection.
Items are further subdivided into named bundles of bitstreams. Bitstreams are, as the name suggests, streams of bits, usually ordinary computer files.
Bitstreams that are somehow closely related, for example HTML files and images that compose a single HTML document, are organized into bundles.
Each bitstream is associated with one Bitstream Format. Because preservation services may be an important aspect of the DSpace service, it is important
to capture the specific formats of files that users submit. In DSpace, a bitstream format is a unique and consistent way to refer to a particular file format. An
integral part of a bitstream format is an either implicit or explicit notion of how material in that format can be interpreted. For example, the interpretation for
bitstreams encoded in the JPEG standard for still image compression is defined explicitly in the Standard ISO/IEC 10918-1. The interpretation of
bitstreams in Microsoft Word 2000 format is defined implicitly, through reference to the Microsoft Word 2000 application. Bitstream formats can be more
specific than MIME types or file suffixes. For example, application/ms-word and .doc span multiple versions of the Microsoft Word application, each of
which produces bitstreams with presumably different characteristics.
40
Each bitstream format additionally has a support level, indicating how well the hosting institution is likely to be able to preserve content in the format in the
future. There are three possible support levels that bitstream formats may be assigned by the hosting institution. The host institution should determine the
exact meaning of each support level, after careful consideration of costs and requirements. MIT Libraries' interpretation is shown below:
Suppo The format is recognized, and the hosting institution is confident it can make bitstreams of this format usable in the future, using whatever
rted combination of techniques (such as migration, emulation, etc.) is appropriate given the context of need.
Known The format is recognized, and the hosting institution will promise to preserve the bitstream as-is, and allow it to be retrieved. The hosting
institution will attempt to obtain enough information to enable the format to be upgraded to the 'supported' level.
Unsup The format is unrecognized, but the hosting institution will undertake to preserve the bitstream as-is and allow it to be retrieved.
ported
Each item has one qualified Dublin Core metadata record. Other metadata might be stored in an item as a serialized bitstream, but we store Dublin Core
for every item for interoperability and ease of discovery. The Dublin Core may be entered by end-users as they submit content, or it might be derived from
other metadata as part of an ingest process.
Items can be removed from DSpace in one of two ways: They may be 'withdrawn', which means they remain in the archive but are completely hidden from
view. In this case, if an end-user attempts to access the withdrawn item, they are presented with a 'tombstone,' that indicates the item has been removed.
For whatever reason, an item may also be 'expunged' if necessary, in which case all traces of it are removed from the archive.
Object Example
Item A technical report; a data set with accompanying description; a video recording of a lecture
Bitstream A single HTML file; a single image file; a source code file
Bitstream Format Microsoft Word version 6.0; JPEG encoded image format
Amazon S3 Support
DSpace offers two means for storing bitstreams. The first is in the file system on the server. The second is using Amazon S3. For more information, see St
orage Layer
41
Technology Overview
DSpace open source software is free to use, and community supported.
DSpace consists of both a frontend (User Interface) and a backend (REST API & other machine interfaces). A brief overview of the technologies used for
each is provided below.
The DSpace Frontend is built on the Angular platform, written in the Typescript language. It uses Bootstrap & HTML5 for theming/styling & strives
for WCAG 2.1 AA alignment. The frontend also uses Angular Universal for "server-side rendering", which allows it to function even when Javascript is
unavailable in the browser. For more information on Angular Universal, see the the Angular University guide.
More information on installing the DSpace Frontend can be found in the Installing DSpace guide.
The DSpace Backend is built on Spring Boot, written in Java. The REST API portion of the backend is built on Spring Technologies, including Spring
REST, Spring HATEOAS, and aligns with Spring Data REST. The REST API uses the Spring Data REST Hal Browser as a basic web interface for
exploring the REST API. All REST API responses are returned in JSON format.
The DSpace Backend requires a relational database (usually PostgreSQL), used to store all the metadata and relationships between objects. All files
uploaded into DSpace are stored on the filesystem (any operating system is supported). Apache Solr is also required, and is used to index all objects for
searching/browsing.
More information on installing the DSpace Backend can be found in the Installing DSpace guide. More information on the REST API specifically can be
found in our REST Contract.
1. Initial static page via server-side rendering (SSR): When a user initially visits any page in the DSpace user interface (UI), this triggers server-side
rendering (SSR) via Angular Universal. This means that the UI (Javascript) application is run on the server by Node.js. The result is that a static
HTML page is generated, which will be sent back to the user.
a. This process of rendering the static HTML page will result in Node.js making requests to REST API to gather all the data necessary to
build the static HTML page.
2. Static page is dynamically replaced by UI application: The user briefly sees the generated static HTML page while the UI (Javascript) application
is downloading to their browser . This allows the user to immediately see the DSpace User Interface even before it becomes interactive. As soon
as the UI application finishes downloading, it dynamically replaces that static HTML page, making the User Interface interactive to the user. (The
time between the UI page appearing and becoming interactive is usually unnoticeable to a user.) This entire process is handled by Angular
Universal.
3. Interactions with the UI application send requests to the REST API (client-side rendering): As soon as the UI becomes interactive, it runs entirely
in the user's browser (as any other Javascript application). This means that when the user interacts with the application (by clicking links/buttons
or typing in fields, etc), this will send requests from the user's browser to the REST API (backend). This is called client-side rendering (CSR) as
all HTML is generated within the user's browser.
a. At this point, every action in the User Interface will generate one or more requests to the REST API to gather necessary data. These
requests are all visible in the user's browser (in the "Network" tab of the browser's "Developer tools").
Keep in mind, SSR can be potentially taxing for very large pages with a lot of objects or data display. This is because Node.js has to make requests to the
REST API to gather all the data for the page before rendering the static HTML. Because of this, we do also document some Performance Tuning
suggestions for the User Interface (e.g. there is an option to cache these SSR generated static pages in order to generate them less frequently).
Some bots and clients may use server-side rendering at all times
For bots or clients without the ability to run Javascript, every page request will trigger SSR (server-side rendering). This is because the static HTML page
can never be dynamically replaced by the User Interface application (in step 2 above). However, this behavior is necessary to support Search Engine
Optimization. Some search engine bots cannot run Javascript & therefore cannot index sites which do not generate static HTML pages.
Running the user interface in development mode disables SSR and may impact SEO
Running the user interface (frontend) in development mode will only utilize client-side rendering (CSR) (as described in step 3 above). This means server-
side rendering (SSR) will never occur, and all HTML will be generated in the user's browser. The result is that bots or clients without the ability to run
Javascript will be unable to interact with the site (which can negatively impact Search Engine Optimization)
42
Installing DSpace
Installation Overview
Installing the Backend (Server API)
Backend Requirements
Backend Installation
Installing the Frontend (User Interface)
Frontend Requirements
Frontend Installation
What Next?
Common Installation Issues
Troubleshoot an error or find detailed error messages
User Interface never appears (no content appears) or "Proxy server received an invalid response"
User Interface partially load but then spins (never fully loads or some content doesn't load)
"500 Service Unavailable" from the User Interface
"No _links section found at..." error from User Interface
"RangeError: Maximum call stack size exceeded"
"XMLHttpRequest.. has been blocked by CORS policy" or "CORS error" or "Invalid CORS request"
Cannot login from the User Interface with a password that I know is valid
"403 Forbidden" error with a message that says "Access is denied. Invalid CSRF Token"
Using a Self-Signed SSL Certificate causes the Frontend to not be able to access the Backend
My REST API is running under HTTPS, but some of its "link" URLs are switching to HTTP
My User Interface's robots.txt has incorrect sitemap URLs
Cannot upload file from User Interface
Javascript heap out of memory
Solr responds with "Expected mime type application/octet-stream but got text/html" (404 Not Found)
Database errors occur when you run ant fresh_install
Installation Overview
Try out DSpace 7 before you install
If you'd like to quickly try out DSpace 7 before a full installation, see Try out DSpace 7 for instructions on a quick install via Docker.
As of version 7 (and above), the DSpace application is split into a "frontend" (User Interface) and a "backend" (Server API). Most institutions will want to
install BOTH. However, you can decide whether to run them on the same machine or separate machines.
The DSpace Frontend consists of a User Interface built on Angular.io. It is a Node.js web application, i.e. once it is built/compiled, it only require
Node.js to run. It cannot be run "standalone", as it requires a valid DSpace Backend to function. The frontend provides all user-facing
functionality.
The DSpace Backend consists of a Server API ("server" webapp), built on Spring Boot. It is a Java web application. It can be run standalone,
however it has no user interface. The backend provides all machine-based interfaces, including the REST API, OAI-PMH, SWORD (v1 and v2)
and RDF.
We recommend installing the Backend first, as the Frontend requires a valid Backend to run properly.
Backend Requirements
UNIX-like OS or Microsoft Windows
Java JDK 11 or 17 (OpenJDK or Oracle JDK)
Apache Maven 3.5.4 or above (Java build tool)
Configuring a Maven Proxy
Apache Ant 1.10.x or later (Java build tool)
Relational Database (PostgreSQL)
PostgreSQL 12.x, 13.x, 14.x or 15.x (with pgcrypto installed)
Oracle (UNSUPPORTED AS OF 7.6)
Apache Solr 8.x (full-text index/search service)
Servlet Engine (Apache Tomcat 9, Jetty, Caucho Resin or equivalent)
(Optional) IP to City Database for Location-based Statistics
43
OpenJDK download and installation instructions can be found here http://openjdk.java.net/install/. Most operating systems provide an easy path
to install OpenJDK. Just be sure to install the full JDK (development kit), and not the JRE (which is often the default example).
Oracle's Java can be downloaded from the following location: http://www.oracle.com/technetwork/java/javase/downloads/index.html. Make sure to
download the appropriate version of the Java SE JDK.
Make sure to install the JDK and not just the JRE
DSpace requires the full JDK (Java Development Kit) be installed, rather than just the JRE (Java Runtime Environment). So, please be sure that you are
installing the full JDK and not just the JRE.
Newer versions of Java may work (e.g. JDK v12-16), but we do not recommend running them in Production. We highly recommend running only Java LTS
(Long Term Support) releases in Production, as non-LTS releases may not receive ongoing security fixes. As of this DSpace release, JDK11 and JDK 17
are the two most recent Java LTS releases. As soon as the next Java LTS release is available, we will analyze it for compatibility with this release of
DSpace. For more information on Java releases, see the Java roadmaps for Oracle and/or OpenJDK.
Maven is necessary in the first stage of the build process to assemble the installation package for your DSpace instance. It gives you the flexibility to
customize DSpace using the existing Maven projects found in the [dspace-source]/dspace/modules directory or by adding in your own Maven project to
build the installation package for DSpace, and apply any custom interface "overlay" changes.
Maven can be downloaded from http://maven.apache.org/download.html It is also provided via many operating system package managers.
Example:
<settings>
.
.
<proxies>
<proxy>
<active>true</active>
<protocol>http</protocol>
<host>proxy.somewhere.com</host>
<port>8080</port>
<username>proxyuser</username>
<password>somepassword</password>
<nonProxyHosts>www.google.com|*.somewhere.com</nonProxyHosts>
</proxy>
</proxies>
.
.
</settings>
Apache Ant is required for the second stage of the build process (deploying/installing the application). First, Maven is used to construct the installer ([dspa
ce-source]/dspace/target/dspace-installer), after which Ant is used to install/deploy DSpace to the installation directory.
Ant can be downloaded from the following location: http://ant.apache.org It is also provided via many operating system package managers.
44
PostgreSQL can be downloaded from http://www.postgresql.org/. It is also provided via many operating system package managers.
Make sure to select a version of PostgreSQL that is still under support from the PostgreSQL team.
If the version of Postgres provided by your package manager is outdated, you may wish to use one of the official PostgreSQL provided
repositories:
Linux users can select their OS of choice for detailed instructions on using the official PostgreSQL apt or yum repository: http://w
ww.postgresql.org/download/linux/
Windows users will need to use the windows installer: http://www.postgresql.org/download/windows/
Mac OSX users can choose their preferred installation method: http://www.postgresql.org/download/macosx/
Install the pgcrypto extension. It will also need to be enabled on your DSpace Database (see Installation instructions below for more info). The
pgcrypto extension allows DSpace to create UUIDs (universally unique identifiers) for all objects in DSpace, which means that (internal) object
identifiers are now globally unique and no longer tied to database sequences.
On most Linux operating systems (Ubuntu, Debian, RedHat), this extension is provided in the "postgresql-contrib" package in your
package manager. So, ensure you've installed "postgresql-contrib".
On Windows, this extension should be provided automatically by the installer (check your "[PostgreSQL]/share/extension" folder for files
starting with "pgcrypto")
Unicode (specifically UTF-8) support must be enabled (but this is enabled by default).
Once installed, you need to enable TCP/IP connections (DSpace uses JDBC):
In postgresql.conf: uncomment the line starting: listen_addresses = 'localhost'. This is the default, in recent PostgreSQL
releases, but you should at least check it.
Then tighten up security a bit by editing pg_hba.conf and adding this line:
This should appear before any lines matching all databases, because the first matching rule governs.
Then restart PostgreSQL.
Oracle support has been removed as was previously announced in March 2022 on our mailing lists. See https://github.com/DSpace/DSpace/issues/8214
Details on acquiring Oracle can be downloaded from the following location: http://www.oracle.com/database/. You will need to create a database
for DSpace. Make sure that the character set is one of the Unicode character sets. DSpace uses UTF-8 natively, and it is suggested that the
Oracle database use the same character set. You will also need to create a user account for DSpace (e.g. dspace) and ensure that it has
permissions to add and remove tables in the database. Refer to the Quick Installation for more details.
NOTE: If the database server is not on the same machine as DSpace, you must install the Oracle client to the DSpace server and point t
nsnames.ora and listener.ora files to the database the Oracle server.
Solr can be obtained at the Apache Software Foundation site for Solr. You may wish to read portions of the quick-start tutorial to make yourself familiar
with Solr's layout and operation. Unpack a Solr .tgz or .zip archive in a place where you keep software that is not handled by your operating system's
package management tools, and arrange to have it running whenever DSpace is running. You should ensure that Solr's index directories will have plenty
of room to grow. You should also ensure that port 8983 is not in use by something else, or configure Solr to use a different port.
If you are looking for a good place to put Solr, consider /opt or /usr/local. You can simply unpack Solr in one place and use it. Or you can configure
Solr to keep its indexes elsewhere, if you need to – see the Solr documentation for how to do this.
It is not necessary to dedicate a Solr instance to DSpace, if you already have one and want to use it. Simply copy DSpace's cores to a place where they
will be discovered by Solr. See below.
Apache Tomcat 9. Tomcat can be downloaded from the following location: http://tomcat.apache.org. It is also provided via many operating
system package managers.
The Tomcat owner (i.e. the user that Tomcat runs as) must have read/write access to the DSpace installation directory (i.e. [dspace])
. There are a few common ways this may be achieved:
One option is to specifically give the Tomcat user (often named "tomcat") ownership of the [dspace] directories, for example:
45
Another option is to have Tomcat itself run as a new user named "dspace" (see installation instructions below). Some operating
systems make modifying the Tomcat "run as" user easily modifiable via an environment variable named TOMCAT_USER. This
option may be more desirable if you have multiple Tomcat instances running, and you do not want all of them to run under the
same Tomcat owner.
On Debian systems, you may also need to modify or override the "tomcat.service" file to specify the DSpace installation directory in the
list of ReadWritePaths. For example:
You need to ensure that Tomcat a) has enough memory to run DSpace, and b) uses UTF-8 as its default file encoding for international
character support. So ensure in your startup scripts (etc) that the following environment variable is set: JAVA_OPTS="-Xmx512M -
Xms64M -Dfile.encoding=UTF-8"
Modifications in [tomcat]/conf/server.xml : You also need to alter Tomcat's default configuration to support searching and browsing of
multi-byte UTF-8 correctly. You need to add a configuration option to the <Connector> element in [tomcat]/config/server.xml: URIEncodin
g="UTF-8" e.g. if you're using the default Tomcat config, it should read:
You may change the port from 8080 by editing it in the file above, and by setting the variable CONNECTOR_PORT in server.xml. You
should set the URIEncoding even if you are running Tomcat behind a reverse proxy (Apache HTTPD, Nginx, etc.) via AJP.
Jetty or Caucho Resin
DSpace 7 has not been tested with Jetty or Caucho Resin, after the switch to Java 11
Older versions of DSpace were able to run on a Tomcat-equivalent servlet Engine, such as Jetty (https://www.eclipse.org/jetty/) or
Caucho Resin (http://www.caucho.com/). If you choose to use a different servlet container, please ensure that it supports Servlet Spec
3.1 (or above).
Jetty and Resin are configured for correct handling of UTF-8 by default.
Backend Installation
1. Install all the Backend Requirements listed above.
2. Create a DSpace operating system user (optional) . As noted in the prerequisites above, Tomcat (or Jetty, etc) must run as an operating
system user account that has full read/write access to the DSpace installation directory (i.e. [dspace]). Either you must ensure the Tomcat
owner also owns [dspace], OR you can create a new "dspace" user account, and ensure that Tomcat also runs as that account:
useradd -m dspace
The choice that makes the most sense for you will probably depend on how you installed your servlet container (Tomcat/Jetty/etc). If you
installed it from source, you will need to create a user account to run it, and that account can be named anything, e.g. 'dspace'. If you used your
operating system's package manager to install the container, then a user account should have been created as part of that process and it will be
much easier to use that account than to try to change it.
3. Download the latest DSpace release from the DSpace GitHub Repository. You can choose to either download the zip or tar.gz file provided by
GitHub, or you can use "git" to checkout the appropriate tag (e.g. dspace-7.2) or branch.
4. Unpack the DSpace software. After downloading the software, based on the compression file format, choose one of the following methods to
unpack your software:
a.
46
4.
unzip dspace-7.2.zip
For ease of reference, we will refer to the location of this unzipped version of the DSpace release as [dspace-source] in the remainder of
these instructions. After unpacking the file, the user may wish to change the ownership of the dspace-7.x folder to the "dspace" user.
(And you may need to change the group).
5. Database Setup
PostgreSQL:
Create a dspace database user (this user can have any name, but we'll assume you name it "dspace"). This is entirely
separate from the dspace operating-system user created above:
You will be prompted (twice) for a password for the new dspace user. Then you'll be prompted for the password of the
PostgreSQL superuser (postgres).
Create a dspace database, owned by the dspace PostgreSQL user. Similar to the previous step, this can only be done by a
"superuser" account in PostgreSQL (e.g. postgres):
You will be prompted for the password of the PostgreSQL superuser (postgres).
Finally, you MUST enable the pgcrypto extension on your new dspace database. Again, this can only be enabled by a
"superuser" account (e.g. postgres)
# Login to the database as a superuser, and enable the pgcrypto extension on this database
psql --username=postgres dspace -c "CREATE EXTENSION pgcrypto;"
The "CREATE EXTENSION" command should return with no result if it succeeds. If it fails or throws an error, it is likely you are
missing the required pgcrypto extension (see Database Prerequisites above).
Alternative method: How to enable pgcrypto via a separate database schema. While the above method of
enabling pgcrypto is perfectly fine for the majority of users, there may be some scenarios where a database
administrator would prefer to install extensions into a database schema that is separate from the DSpace tables.
Developers also may wish to install pgcrypto into a separate schema if they plan to "clean" (recreate) their
development database frequently. Keeping extensions in a separate schema from the DSpace tables will ensure
developers would NOT have to continually re-enable the extension each time you run a "./dspace database
clean". If you wish to install pgcrypto in a separate schema here's how to do that:
47
Oracle (UNSUPPORTED AS OF 7.6):
Setting up DSpace to use Oracle is a bit different now. You will need still need to get a copy of the Oracle JDBC driver, but
instead of copying it into a lib directory you will need to install it into your local Maven repository. (You'll need to download it first
from this location: http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html.) Run the following
command (all on one line):
mvn install:install-file
-Dfile=ojdbc6.jar
-DgroupId=com.oracle
-DartifactId=ojdbc6
-Dversion=11.2.0.4.0
-Dpackaging=jar
-DgeneratePom=true
You need to compile DSpace with an Oracle driver (ojdbc6.jar) corresponding to your Oracle version - update the version in [dsp
ace-source]/pom.xml E.g.:
<dependency>
<groupId>com.oracle</groupId>
<artifactId>ojdbc6</artifactId>
<version>11.2.0.4.0</version>
</dependency>
Create a database for DSpace. Make sure that the character set is one of the Unicode character sets. DSpace uses UTF-8
natively, and it is required that the Oracle database use the same character set. Create a user account for DSpace (e.g. dspace)
and ensure that it has permissions to add and remove tables in the database.
NOTE: You will need to ensure the proper db.* settings are specified in your local.cfg file (see next step), as the defaults
for all of these settings assuming a PostgreSQL database backend.
db.url = jdbc:oracle:thin:@host:port/SID
# e.g. db.url = jdbc:oracle:thin:@//localhost:1521/xe
# NOTE: in db.url, SID is the SID of your database defined in tnsnames.ora
# the default Oracle port is 1521
# You may also use a full SID definition, e.g.
# db.url = jdbc:oracle:thin:@(description=(address_list=(address=(protocol=TCP)
(host=localhost)(port=1521)))(connect_data=(service_name=DSPACE)))
Later, during the Maven build step, don't forget to specify mvn -Ddb.name=oracle package
6. Initial Configuration (local.cfg): Create your own [dspace-source]/dspace/config/local.cfg configuration file. You may wish to
simply copy the provided [dspace-source]/dspace/config/local.cfg.EXAMPLE. This local.cfg file can be used to store any
configuration changes that you wish to make which are local to your installation (see local.cfg configuration file documentation). ANY setting may
be copied into this local.cfg file from the dspace.cfg or any other *.cfg file in order to override the default setting (see note below). For the initial
installation of DSpace, there are some key settings you'll likely want to override. Those are provided in the [dspace-source]/dspace
/config/local.cfg.EXAMPLE. (NOTE: Settings followed with an asterisk (*) are highly recommended, while all others are optional during
initial installation and may be customized at a later time.)
dspace.dir* - must be set to the [dspace] (installation) directory (NOTE: On Windows be sure to use forward slashes for the directory
path! For example: "C:/dspace" is a valid path for Windows.)
dspace.server.url* - complete URL of this DSpace backend (including port and any subpath). Do not end with '/'. For example:
http://localhost:8080/server
dspace.ui.url* - complete URL of the DSpace frontend (including port and any subpath). REQUIRED for the REST API to fully trust
requests from the DSpace frontend. Do not end with '/'. For example: http://localhost:4000
dspace.name - Human-readable, "proper" name of your server, e.g. "My Digital Library".
solr.server* - complete URL of the Solr server. DSpace makes use of Solr for indexing purposes. http://localhost:8983/solr unless
you changed the port or installed Solr on some other host.
default.language - Default language for all metadata values (defaults to "en_US")
db.url* - The full JDBC URL to your database (examples are provided in the local.cfg.EXAMPLE)
db.driver* - Which database driver to use for PostgreSQL (default should be fine)
db.dialect* - Which database dialect to use for PostgreSQL (default should be fine)
db.username* - the database username used in the previous step.
48
db.password* - the database password used in the previous step.
db.schema* - the database schema to use (examples are provided in the local.cfg.EXAMPLE)
mail.server - fully-qualified domain name of your outgoing mail server.
mail.from.address - the "From:" address to put on email sent by DSpace.
feedback.recipient - mailbox for feedback mail.
mail.admin - mailbox for DSpace site administrator.
alert.recipient - mailbox for server errors/alerts (not essential but very useful!)
registration.notify- mailbox for emails when new users register (optional)
Your local.cfg file can override ANY settings from other *.cfg files in DSpace
The provided local.cfg.EXAMPLE only includes a small subset of the configuration settings available with DSpace. It provides a
good starting point for your own local.cfg file.
However, you should be aware that ANY configuration can now be copied into your local.cfg to override the default settings. This
includes ANY of the settings/configurations in:
Individual settings may also be commented out or removed in your local.cfg, in order to re-enable default settings.
mkdir [dspace]
chown dspace [dspace]
cd [dspace-source]
mvn package
Without any extra arguments, the DSpace installation package is initialized for PostgreSQL. If you want to use Oracle instead, you should build
the DSpace installation package as follows:
mvn -Ddb.name=oracle package
9. Install DSpace Backend: As the dspace UNIX user, install DSpace to [dspace]:
cd [dspace-source]/dspace/target/dspace-installer
ant fresh_install
To see a complete list of build targets, run: ant help The most likely thing to go wrong here is the test of your database connection. See the Co
mmon Installation Issues Section below for more details.
10. Initialize your Database: While this step is optional (as the DSpace database should auto-initialize itself on first startup), it's always good to
verify one last time that your database connection is working properly. To initialize the database run:
a. After running this script, it's a good idea to run "./dspace database info" to check that your database has been fully initialized. A fully
initialized database should list the state of all migrations as either "Success" or "Out of Order". If any migrations have failed or are still
listed as "Pending", then you need to check your "dspace.log" for possible "ERROR" messages. If any errors appeared, you will need to
resolve them before continuing.
11. Deploy Server web application: The DSpace backend consists of a single "server" webapp (in [dspace]/webapps/server). You need to
deploy this webapp into your Servlet Container (e.g. Tomcat). Generally, there are two options (or techniques) which you could use...either
configure Tomcat to find the DSpace "server" webapp, or copy the "server" webapp into Tomcat's own webapps folder.
Technique A. Tell your Tomcat/Jetty/Resin installation where to find your DSpace web application(s). As an example, in the file [tomcat
]/conf/server.xml you can add a Context member under the `<Host>` (but replace [dspace]with your installation location):
49
The name of the file (not including the suffix ".xml") will be the name of the context, so for example server.xml defines the context at h
ttp://host:8080/server. To define the root context (http://host:8080/), name that context's file ROOT.xml. Optionally, you
can also choose to install the old, deprecated "rest" webapp if you
Technique B. Simple and complete. You copy only (or all) of the DSpace Web application(s) you wish to use from the [dspace]/webapps
directory to the appropriate directory in your Tomcat/Jetty/Resin installation. For example:
cp -R [dspace]/webapps/* [tomcat]/webapps (This will copy all the web applications to Tomcat).
cp -R [dspace]/webapps/server [tomcat]/webapps (This will copy only the Server web application to Tomcat.)
To define the root context (http://host:8080/), name that context's directory ROOT.
12. Optionally, also install the deprecated DSpace 6.x REST API web application. If you previously used the DSpace 6.x REST API, for
backwards compatibility the old, deprecated "rest" webapp is still available to install (in [dspace]/webapps/rest). It is NOT used by the
DSpace frontend. So, most users should skip this step.
13. Copy Solr cores: DSpace installation creates a set of four empty Solr cores already configured.
a. Copy them from [dspace]/solr to the place where your Solr instance will discover them. For example:
# Make sure everything is owned by the system user who owns Solr
# Usually this is a 'solr' user account
# See https://solr.apache.org/guide/8_1/taking-solr-to-production.html#create-the-solr-user
chown -R solr:solr [solr]/server/solr/configsets
[solr]/bin/solr restart
c. You can check the status of Solr and your new DSpace cores by using its administrative web interface. Browse to ${solr.server} (e.
g. http://localhost:8983/solr/) to see if Solr is running well, then look at the cores by selecting (on the left) Core Admin or
using the Core Selector drop list.
i. For example, to test that your "search" core is setup properly, try accessing the URL ${solr.server}/search/select. It sh
ould run an empty query against the "search" core, returning an empty JSON result. If it returns an error, then that means your
"search" core is missing or not installed properly.
14. Create an Administrator Account: Create an initial administrator account from the command line:
[dspace]/bin/dspace create-administrator
15. Initial Startup! Now the moment of truth! Start up (or restart) Tomcat/Jetty/Resin.
a. REST API Interface - (e.g.) http://dspace.myu.edu:8080/server/
b. OAI-PMH Interface - (e.g.) http://dspace.myu.edu:8080/server/oai/request?verb=Identify
c. For an example of what the default backend looks like, visit the Demo Backend: https://demo.dspace.org/server/
16. Setup scheduled tasks for behind-the-scenes processes: For all features of DSpace to work properly, there are some scheduled tasks you
MUST setup to run on a regular basis. Some examples are tasks that help create thumbnails (for images), do full-text indexing (of textual content)
and send out subscription emails. See the Scheduled Tasks via Cron for more details.
17. Production Installation (adding HTTPS support): Running the DSpace Backend on HTTP & port 8080 is only usable for local development
environments (where you are running the UI and REST API from the same machine, and only accessing them via localhost URLs). If you want
to run DSpace in Production, you MUST run the backend with HTTPS support (otherwise logins will not work outside of your local domain).
a. For HTTPS support, we recommend installing either Apache HTTPD or Nginx, configuring SSL at that level, and proxying all requests to
your Tomcat installation. Keep in mind, if you want to host both the DSpace Backend and Frontend on the same server, you can use
one installation of Apache HTTPD or NGinx to manage HTTPS/SSL and proxy to both.
b. Apache HTTPD: These instructions are specific to Apache HTTPD, but a similar setup can be achieved with NGinx (see below)
i. Install Apache HTTPD, e.g. sudo apt install apache2
ii. Install mod_headers, mod_proxy and mod_proxy_ajp (or mod_proxy_http) modules, e.g. sudo a2enmod headers; sudo
a2enmod proxy; sudo a2enmod proxy_ajp
1. Alternatively, you can choose to use mod_proxy_http to create an http proxy. A separate example is commented out
below
iii. For mod_proxy_ajp to communicate with Tomcat, you'll need to enable Tomcat's AJP connector in your Tomcat's server.xml:
vi.
50
vi. Now, setup a new VirtualHost for your site (using HTTPS / port 443) which proxies all requests to Tomcat's AJP connector
(running on port 8009)
<VirtualHost _default_:443>
# Add your domain here. We've added "my.dspace.edu" as an example
ServerName my.dspace.edu
.. setup your host how you want, including log settings... .. setup your host how
you want, including log settings...
# Most installs will need these options enabled to ensure DSpace knows its hostname and
scheme (http or https)
# Also required to ensure correct sitemap URLs appear in /robots.txt for User Interface.
ProxyPreserveHost On
RequestHeader set X-Forwarded-Proto https
SSLEngine on
SSLCertificateFile [full-path-to-PEM-cert]
SSLCertificateKeyFile [full-path-to-cert-KEY]
# LetsEncrypt certificates (and possibly others) may require a chain file be specified
# in order for the UI / Node.js to validate the HTTPS connection.
#SSLCertificateChainFile [full-path-to-chain-file]
# Proxy all HTTPS requests to "/server" from Apache to Tomcat via AJP connector
ProxyPass /server ajp://localhost:8009/server
ProxyPassReverse /server ajp://localhost:8009/server
# Proxy all HTTPS requests to "/server" from NGinx to Tomcat on port 8080
location /server {
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-Host $host;
proxy_pass http://localhost:8080/server;
}
}
d. After switching to HTTPS, make sure to go back and update the URLs (primarily dspace.server.url) in your local.cfg to match the
new URL of your backend (REST API). This will require briefly rebooting Tomcat.
51
Below instructions are specific to 7.2 (or later)
The Frontend Instructions below are specific to 7.2 or later. For Frontend Installation instructions for 7.0 or 7.1, see 7.0-7.1 Frontend Installation
Frontend Requirements
UNIX-like OS or Microsoft Windows
Node.js (v16.x or v18.x)
Yarn (v1.x)
PM2 (or another Process Manager for Node.js apps) (optional, but recommended for Production)
DSpace 7.x Backend (see above)
Node.js can be found at https://nodejs.org/. It may be available through your Linux distribution's package manager. We recommend running a Lo
ng Term Support (LTS) version (even numbered releases). Non-LTS versions (odd numbered releases) are not recommended.
Node.js is a Javascript runtime that also provides npm (Node Package Manager). It is used to both build and run the frontend.
NOTE: Node v14 also should work. However, that version is nearing end-of-life. We recommend updating to Node 16 or 18.
Yarn (v1.x)
Yarn v1.x is available at https://classic.yarnpkg.com/. It can usually be install via NPM (or through your Linux distribution's package manager). W
e do NOT currently support Yarn v2.
# You may need to run this command using "sudo" if you don't have proper privileges
npm install --global yarn
PM2 (or another Process Manager for Node.js apps) (optional, but recommended for Production)
In Production scenarios, we highly recommend starting/stopping the User Interface using a Node.js process manager. There are several
available, but our current favorite is PM2. The rest of this installation guide assumes you are using PM2.
PM2 is very easily installed via NPM
# You may need to run this command using "sudo" if you don't have proper privileges
npm install --global pm2
Frontend Installation
Below instructions are specific to 7.2 (or later)
The Frontend Instructions below are specific to 7.2 or later. For Frontend Installation instructions for 7.0 or 7.1, see 7.0-7.1 Frontend Installation
1. Download Code (to [dspace-angular]): Download the latest dspace-angular release from the DSpace GitHub repository. You can choose to
either download the zip or tar.gz file provided by GitHub, or you can use "git" to checkout the appropriate tag (e.g. dspace-7.2) or branch.
a. NOTE: For the rest of these instructions, we'll refer to the source code location as [dspace-angular].
2. Install Dependencies: Install all required local dependencies by running the following from within the unzipped [dspace-angular] directory
52
# change directory to our repo
cd [dspace-angular]
# NOTE: Some dependencies occasionally get overly strict over exact versions of Node & Yarn.
# If you are running a supported version of Node & Yarn, but see a message like
# `The engine "node" is incompatible with this module.`, you can disregard it using this flag:
# yarn install --ignore-engines
3. Build/Compile: Build the User Interface for Production. This builds source code (under [dspace-angular]/src/) to create a compiled
version of the User Interface in the [dspace-angular]/dist folder. This /dist folder is what we will deploy & run to start the UI.
yarn build:prod
a. You only need to rebuild the UI application if you change source code (under [dspace-angular]/src/). Simply changing the
configurations (e.g. config.prod.yml, see below) do not require a rebuild, but only require restarting the UI.
4. Deployment (to [dspace-ui-deploy]): (Only recommended for Production setups) Choose/Create a directory on your server where you wish to
run the compiled User Interface. We'll call this [dspace-ui-deploy].
[dspace-ui-deploy] vs [dspace-angular]
[dspace-angular] is the directory where you've downloaded and built the UI source code (per the instructions above). For deployment
/running the UI, we recommend creating an entirely separate [dspace-ui-deploy] directory. This keeps your running, production User
Interface separate from your source code directory and also minimizes downtime when rebuilding your UI. You may even choose to deploy to a [
dspace-ui-deploy] directory on a different server (and copy the /dist directory over via FTP or similar).
If you are installing the UI for the first time, or just want a simple setup, you can choose to have [dspace-ui-deploy] and [dspace-angular] be the sa
me directory. This would mean you don't have to copy your /dist folder to another location. However, the downside is that your running site will
become unresponsive whenever you do a re-build/re-compile (i.e. rerun "yarn build:prod") as this build process will first delete the [dspace-
angular]/dist directory before rebuilding it.
a. Copy the entire [dspace-angular]/dist/ folder to this location. For example:
cp -r [dspace-angular]/dist [dspace-ui-deploy]
b. WARNING: At this time, you MUST copy the entire "dist" folder and make sure NOT to rename it. Therefore, the directory structure
should look like this:
[dspace-ui-deploy]
/dist
/browser (compiled client-side code)
/server (compiled server-side code, including "main.js")
/config (Optionally created in the "Configuration" step below)
/config.prod.yml (Optionally created in the "Configuration" step below)
c. NOTE: the OS account which runs the UI via Node.js (see below) MUST have write privileges to the [dspace-ui-deploy] directory
(because on startup, the runtime configuration is written to [dspace-ui-deploy]/dist/browser/assets/config.json)
5. Configuration: You have two options for User Interface Configuration, Environment Variables or YAML-based configuration (config.prod.yml)
. Choose one!
a. YAML configuration: Create a "config.prod.yml" at [dspace-ui-deploy]/config/config.prod.yml. You may wish to use the [ds
pace-angular]/config/config.example.yml as a starting point. This config.prod.yml file can be used to override any of
the default configurations listed in the config.example.yml (in that same directory). At a minimum this file MUST include a "rest"
section (and may also include a "ui" section), similar to the following (keep in mind, you only need to include settings that you need to
modify).
53
Example config.prod.yml
# The "ui" section defines where you want Node.js to run/respond. It often is a *localhost* (non-
public) URL, especially if you are using a Proxy.
# In this example, we are setting up our UI to just use localhost, port 4000.
# This is a common setup for when you want to use Apache or Nginx to handle HTTPS and proxy
requests to Node on port 4000
ui:
ssl: false
host: localhost
port: 4000
nameSpace: /
b. Environment variables: Every configuration in the UI may be specified via an Environment Variable. See Configuration Override in the Us
er Interface Configuration documentation for more details. For example, the below environment variables provide the same setup as the
config.prod.yml example above.
# "ui" section
DSPACE_UI_SSL = false
DSPACE_UI_HOST = localhost
DSPACE_UI_PORT = 4000
DSPACE_UI_NAMESPACE = /
# "rest" section
DSPACE_REST_SSL = true
DSPACE_REST_HOST = api.mydspace.edu
DSPACE_REST_PORT = 443
DSPACE_REST_NAMESPACE = /server
i. NOTE: When using PM2, some may find it easier to use Environment variables, as it allows you to specify DSpace UI configs
within your PM2 configuration. See PM2 instructions below.
c. Configuration Hints:
i. See the User Interface Configuration documentation for a list of all available configurations.
ii. In the "ui" section above, you may wish to start with "ssl: false" and "port: 4000" just to be certain that everything else is working
properly before adding HTTPS support. KEEP IN MIND, we highly recommend always using HTTPS for Production. (See
section on HTTPS below)
iii. (Optionally) Test the connection to your REST API from the UI from the command-line. This is not required, but it can
sometimes help you discover immediate configuration issues if the test fails.
1. If you are using YAML configs, copy your config.prod.yml back into your source code folder at [dspace-angular]
/config/config.prod.yml
2. From [dspace-angular], run yarn test:rest This script will attempt a basic Node.js connection to the REST
API that is configured in your "config.prod.yml" file and validate the response.
3. A successful connection should return a 200 Response and all JSON validation checks should return "true"
4. If you receive a connection error or different response code, you MUST fix your REST API before the UI will be able to
work. See also the "Common Installation Issues" below. If you receive an SSL error, see "Using a Self-Signed SSL
Certificate causes the Frontend to not be able to access the Backend"
iv. When using a subpath (nameSpace) in your UI server base URL (e.g. "http://localhost:4000/mysite/" instead of "http://localhost:
4000/"), you must make sure that the URL without the subpath is added to the rest.cors.allowed-origins list in [dspa
ce]/config/modules/rest.cfg or the local.cfg override. The default value used for this configuration assumes that
Origin and DSpace URL are identical, but CORS origins do not contain a subpath.
6.
54
6. Start up the User Interface: The compiled User Interface only requires Node.js to run. However, most users may want to use PM2 (or a similar
Node.js process manager) in Production to provide easier logging and restart tools.
a. Quick Start: To quickly startup / test the User Interface, you can just use Node.js. This is only recommended for quickly testing the UI is
working, as no logs are available.
b. Run via PM2: Using PM2 (or a different Node.js process manager) is highly recommended for Production scenarios. Here's an example
of a Production setup of PM2.
i. First you need to create a PM2 JSON configuration file which will run the User Interface. This file can be named anything &
placed where ever you like, but you may want to save it to your deployment directory (e.g. [dspace-ui-deploy]/dspace-
ui.json).
dspace-ui.json
{
"apps": [
{
"name": "dspace-ui",
"cwd": "/full/path/to/dspace-ui-deploy",
"script": "dist/server/main.js",
"instances": "max",
"exec_mode": "cluster",
"env": {
"NODE_ENV": "production"
}
}
]
}
1. NOTE: The "cwd" setting MUST correspond to your [dspace-ui-deploy] folder path.
2. NOTE #2: The "exec_mode" and "instances" settings are optional but highly recommended. Setting "exec_mode" to
"cluster" enable's PM2's cluster mode. This will provide better performance in production as it allows PM2 to scale
your site across multiple CPUs. The "instances" setting tells PM2 how many CPUs to scale across ("max" means all
CPUs, but you can also specify a number.)
3. NOTE #3: If you wanted to configure your UI using Environment Variables, specify those Environment Variables under
the "env" section. For example:
"env": {
"NODE_ENV": "production",
"DSPACE_REST_SSL": "true",
"DSPACE_REST_HOST": "demo.dspace.org",
"DSPACE_REST_PORT": "443",
"DSPACE_REST_NAMESPACE": "/server"
}
4. NOTE #4: If you are using Windows, there are two other rules to keep in mind in this JSON configuration. First, all
paths must include double backslashes (e.g. "C:\\dspace-ui-deploy"). Second, "cluster" mode is required. Here's an
example configuration for Windows:
{
"apps": [
{
"name": "dspace-ui",
"cwd": "C:\\full\\path\\to\\dspace-ui-deploy",
"script": "dist\\server\\main.js",
"instances": "max",
55
"exec_mode": "cluster",
"env": {
"NODE_ENV": "production"
}
}
]
}
ii. Now, start the application using PM2 using the configuration file you created in the previous step
# If you need to change your PM2 configs, delete the old config and restart
# pm2 delete dspace-ui.json
1. Now, setup (or update) the new VirtualHost for your UI site (preferably using HTTPS / port 443) which proxies all
requests to PM2 running on port 4000.
<VirtualHost _default_:443>
# Add your domain here. We've added "my.dspace.edu" as an example
ServerName my.dspace.edu
.. setup your host how you want, including log settings...
# Most installs will need these options enabled to ensure DSpace knows its
hostname and scheme (http or https)
# Also required to ensure correct sitemap URLs appear in /robots.txt for User
Interface.
ProxyPreserveHost On
RequestHeader set X-Forwarded-Proto https
# These SSL settings are identical to those for the backend installation (see
above)
# If you already have the backend running HTTPS, just add the new Proxy settings
below.
SSLEngine on
SSLCertificateFile [full-path-to-PEM-cert]
SSLCertificateKeyFile [full-path-to-cert-KEY]
56
# LetsEncrypt certificates (and possibly others) may require a chain file be
specified
# in order for the UI / Node.js to validate the HTTPS connection.
#SSLCertificateChainFile [full-path-to-chain-file]
# These Proxy settings are for the backend. They are described in the backend
installation (see above)
# If you already have the backend running HTTPS, just append the new Proxy
settings below.
# Proxy all HTTPS requests to "/server" from Apache to Tomcat via AJP connector
# (In this example: https://my.dspace.edu/server/ will display the REST API)
ProxyPass /server ajp://localhost:8009/server
ProxyPassReverse /server ajp://localhost:8009/server
# [NEW FOR UI:] Proxy all HTTPS requests from Apache to PM2 on localhost, port
4000
# NOTE that this proxy URL must match the "ui" settings in your config.prod.yml
# (In this example: https://my.dspace.edu/ will display the User Interface)
ProxyPass / http://localhost:4000/
ProxyPassReverse / http://localhost:4000/
</VirtualHost>
# Proxy all HTTPS requests to "/server" from NGinx to Tomcat on port 8080
# These Proxy settings are for the backend. They are described in the backend
installation (see above)
location /server {
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-Host $host;
proxy_pass http://localhost:8080/server;
}
# [NEW FOR UI:] Proxy all HTTPS requests from NGinx to PM2 on localhost, port 4000
# NOTE that this proxy URL must match the "ui" settings in your config.prod.yml
# (In this example: https://my.dspace.edu/ will display the User Interface)
location / {
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-Host $host;
proxy_pass http://localhost:4000/;
}
}
iv. HINT#1: Because you are using a proxy for HTTPS support, in your User Interface Configuration, your "ui" settings will still
have "ssl: false" and "port: 4000". This is perfectly OK!
v. HINT#2: to force the UI to connect to the backend using HTTPS, you should verify your "rest" settings in your User Interface
Configuration match the "dspace.server.url" in your backend's "local.cfg" and both use the HTTPS URL. So, if your backend
(REST API) is proxied to https://my.dspace.edu/server/, both those settings should specify that HTTPS URL.
vi. HINT#3: to force the backend to recognize the HTTPS UI, make sure to update your "dspace.ui.url" in your backend's "local.
cfg" is updated to use the new HTTPS UI URL (e.g. https://my.dspace.edu).
b. (Alternatively) You can use the basic HTTPS support built into our UI and Node server. (This may currently be better for non-Production
environments as it has not been well tested)
i. Create a [dspace-ui-deploy]/config/ssl/ folder and add a key.pem and cert.pem to that folder (they must have
those exact names)
ii.
57
b.
ii. In your User Interface Configuration, go back and update the following:
1. Set "ui > ssl" to true
2. Update "ui > port" to be 443
a. In order to run Node/PM2 on port 443, you also will likely need to provide node with special permissions, like i
n this example.
iii. Restart the UI
iv. Keep in mind, while this setup is simple, you may not have the same level of detailed, Production logs as you would with
Apache HTTPD or Nginx
What Next?
After a successful installation, you may want to take a closer look at
Performance Tuning DSpace: If you are noticing any slowness in your Production site, we have a guide for how you might speed things up.
User Interface Customization: Documentation on customizing the User Interface with your own branding / theme(s)
User Interface Configuration: Additional configurations available in the User Interface.
Submission User Interface: Options to configure/customize the default Submission (deposit) process
Configurable Workflow: Options to configure/customize the default Workflow approval process
Scheduled Tasks via Cron : Several DSpace features require that a command-line script is run regularly via cron.
Configuration Reference : Details on the configuration options available to the Backend
Handle Server installation: Optionally, you may wish to enable persistent URLs for your DSpace site using CRNI's Handle.Net Registry
Statistics and Metrics: Optionally, you may wish to configuration one (or more) Statistics options within DSpace, including Google Analytics and
(internal) Solr Statistics
Multilingual Support: Optionally, you may wish to enable multilingual support in your DSpace site.
Using DSpace : Various other pages which describe usage and additional configurations related to other DSpace features.
System Administration: Various other pages which describe additional backend installation options/configurations.
Visit the Troubleshoot an error guide for tips on locating the cause of the error
Review Commons Installation Issues (see below)
Ask for Support via one of the support options documented on that page
User Interface never appears (no content appears) or "Proxy server received an invalid response"
Chances are your User Interface (UI) is throwing a severe error or not starting properly. The best way to debug this issue would be to start the User
Interface in development mode to see if it can give you a more descriptive error.
1. First, create a [dspace-ui-deploy]/config/config.dev.yml configuration file for development. This file supports the same configs as
your existing config.prod.yml. So, you can copy over any settings you want to test out.
2. Start the UI in development mode (this doesn't require a proxy like Apache or Nginx)
yarn start:dev
3. This will boot up the User Interface on whatever port you specified in "config.dev.yml"
4. At this point, attempt to access the UI from your web browser. Even if it isn't fully working, you should be able to still get more information from
your browser's DevTools regarding the underlying error. See the Troubleshoot an error page, look for the section on "DSpace 7.x". It has a guide
for locating UI error messages in your browser's Developer Tools.
Once you've found the underlying error, it may be one of the "common installation issues" listed below.
User Interface partially load but then spins (never fully loads or some content doesn't load)
Chances are your User Interface (UI) is throwing an error or receiving an unexpected response from the REST API backend. Since the UI is Javascript
based, it runs entirely in your browser. That means the error it's hitting is most easily viewed in your browser (and in fact the error may never appear in log
files).
See the Troubleshoot an error page, look for the section on "DSpace 7.x". It has a guide for locating UI error messages in your browser's Developer Tools.
58
"No _links section found at..." error from User Interface
When starting up the User Interface for the first time, you may see an error that looks similar to this...
This error means that the UI is unable to contact the REST API listed at [rest-api-url] and/or the response from that [rest-api-url] is
unexpected (as it doesn't contain the "_links" to the endpoints available at that REST API). A valid DSpace [rest-api-url] will respond with JSON
similar to our demo API at https://demo.dspace.org/server/api
First, test the connection to your REST API from the UI from the command-line.
# This script will attempt a basic Node.js connection to the REST API
# configured in your "[dspace-angular]/config/config.prod.yml" and
# validate the response.(NOTE: config.prod.yml MUST be copied to
# to [dspace-angular]/config/ for this script to find it!)
yarn test:rest
A successful connection should return a 200 Response and all JSON validation checks should return "true".
If you receive a connection error or different response code, you MUST fix your REST API before the UI will be able to work (see additional hints
below for likely causes).
Per the Apache docs, you can also use the SSLCertificateFile setting to specify intermediate CA certificates along with the main cert.
For self-signed certs, see also "Using a Self-Signed SSL Certificate causes the Frontend to not be able to access the Backend" common
issue listed below.
Something blocking access to the REST API. This may be a proxy issue, a firewall issue, or something else generally blocking the port (e.g. port
443 for SSL).
Verify that you can access the REST API from the machine where Node.js is running (i.e. your UI is running). For example try a simple
"wget" or "curl" to verify the REST API is returning expected JSON similar to our demo API at https://demo.dspace.org/server/api
# Attempt to access the REST API via HTTPS from command-line on the machine where Node.js is
running.
# If this fails or throws a SSL cert error, you must fix it.
wget https://[rest.host]/server/api
In most production scenarios, your REST API should be publicly accessible on the web, unless you are guaranteed that all your
DSpace users will access the site behind a VPN or similar. So, this "No _links section found" error may also occur if you are accessing
the UI from a client computer/web browser which is unable to access the REST API.
59
If none of the above suggestions helped, you may want to look closer at the request logs in your browser (using browser's Dev Tools) and server-side logs,
to be sure that the requests from your UI are going where you expect, and see if they appear also on the backend. Tips for finding these logs can be
found in the "DSpace 7.x" section of our Troubleshoot an error guide.
This error means that the UI is trying to contact your REST API, but is having issues doing so (possibly because either a proxy or an HTTPHTTPS redirect
is causing issues or a redirect loop).
Double check your "dspace.server.url" setting in your local.cfg on the backend. Is it the same URL you use in your browser to access the backend?
Keep in mind the mode (http vs https), domain, port, and subpath(s) all must match, and it must not end in a trailing slash.
Also double check the "rest" section of your config.*.yml (or environment.*.ts for 7.1 or 7.0) configuration file for the User Interface. Make sure it's
also pointing to the exact same URL as that "dspace.server.url" setting. Again, check the mode, domain, port and paths all match exactly.
"XMLHttpRequest.. has been blocked by CORS policy" or "CORS error" or "Invalid CORS request"
If you are seeing a CORS error in your browser, this means that you are accessing the REST API via an "untrusted" client application. To fix this error,
you must change your REST API / Backend configuration to trust the application.
By default, the DSpace REST API / Backend will only trust the application at dspace.ui.url. Therefore, you should first verify that your dspac
e.ui.url setting (in your local.cfg) exactly matches the primary URL of your User Interface (i.e. the URL you see in the browser). This must be
an exact match: mode (http vs https), domain, port, and subpath(s) all must match.
If you need to trust additional client applications / URLs, those MUST be added to the rest.cors.allowed-origins configuration. See REST
API for details on this configuration.
Also, check your Tomcat (or servlet container) log files. If Tomcat throws a syntax or other major error, it may return an error response that
triggers a CORS error. In this scenario, the CORS error is only a side effect of a larger error.
If you modify either of the above settings, you will need to restart Tomcat for the changes to take effect.
Cannot login from the User Interface with a password that I know is valid
If you cannot login via the user interface with a valid password, you should check to see what underlying error is being returned by the REST API. The
easiest way to do this is by using your web browser's Dev Tools as described in our Troubleshoot an error guide (see the "Try this first" section for DSpace
7).
If the password is valid, more than likely you'll see the underlying error is "403 Forbidden" error with a message that says "Access is denied. Invalid CSRF
Token" (see hints on solving this in the very next section)
"403 Forbidden" error with a message that says "Access is denied. Invalid CSRF Token"
First, double check that you are seeing that exact error message. A 403 Forbidden error may be thrown in a variety of scenarios. For example, a 403
may be thrown if a page requires a login, if you have entered an invalid username or password, or even sometimes when there is a CORS error (see
previous installation issue for how to solve that).
If you are seeing the message "Invalid CSRF Token" message (especially on every login), this is usually the result of a configuration / setup issue.
1. If you site had been working, and this error seems random, it is possibly that DSPACE-XSRF-COOKIE cookie in your browser just got "out of sync"
(this can occur if you are logging into the REST API and UI separately in the same browser).
a. Logout and login & try the same action again. If it works this time, then that cookie was just "out of sync". If it fails a second time, then
there is a likely configuration issue...see suggestions below.
2. Make sure your backend is running HTTPS! This is the most common cause of this error. The only scenario where you can run the backend
in HTTP is when both the frontend & backend URLs are "localhost"-based URLs.
a. The reason for this HTTPS requirement is that most modern browsers will automatically block cross-domain cookies when using HTTP.
Cross-domain cookies are required for successful authentication. The only exception is when both the frontend and backend are using
localhost URLs (as in that scenario the cookies no longer need to be sent cross-domain). A more technical description of this behavior is
in the sub-bullets below.
i. If the REST API Backend is running HTTP, then it will always send the required DSPACE-XSRF-COOKIE cookie with a value of
SameSite=Lax. This setting means that the cookie will not be sent (by your browser) to any other domains. Effectively, this
will block all logins from any domain that is not the same as the REST API (as this cookie will not be sent back to the REST API
as required for CSRF validation). In other words, running the REST API on HTTP is only possible if the User Interface is
running on the exact same domain. For example, running both on 'localhost' with HTTP is a common development setup, and
this will work fine.
ii. In order to allow for cross-domain logins, you MUST enable HTTPS on the REST API. This will result in the DSPACE-XSRF-
COOKIE cookie being set to SameSite=None; Secure. This setting means the cookie will be sent cross domain, but only for
HTTPS requests. It also allows the user interface (or other client applications) to be on any domain, provided that the domain is
trusted by CORS (see rest.cors.allowed-origins setting in REST API)
3.
60
3. Verify that your User Interface's "rest" section matches the value of "dspace.server.url" configuration on the Backend. This simply ensures
your UI is sending requests to the correct REST API. Also pay close attention that both specify HTTPS when necessary (see previous bullet).
4. Verify that your "dspace.server.url" configuration on the Backend matches the primary URL of the REST API (i.e. the URL you see in the
browser). This must be an exact match: mode (http vs https), domain, port, and subpath(s) all must match, and it must not end in a trailing slash
(e.g. "https://demo.dspace.org/server" is valid, but "https://demo.dspace.org/server/" may cause problems).
5. Verify that your "dspace.ui.url" configuration on the Backend matches the primary URL of your User Interface (i.e. the URL you see in the
browser). This must be an exact match: mode (http vs https), domain, port, and subpath(s) all must match, and it must not end in a trailing slash (e
.g. "https://demo.dspace.org" is valid, but "https://demo.dspace.org/" may cause problems).
6. Verify that nothing (e.g. a proxy) is blocking Cookies and HTTP Headers from being passed between the UI and REST API. DSpace's CSRF
protection relies on the client (User Interface) being able to return both a valid DSPACE-XSRF-COOKIE cookie and a matching X-XSRF-TOKEN
header back to the REST API for validation. See our REST Contract for more details https://github.com/DSpace/RestContract/blob/main/csrf-
tokens.md
7. If you are running a custom application, or accessing the REST API from the command-line (or other third party tool like Postman), you MUST
ensure you are sending the CSRF token on every modifying request. See our REST Contract for more details https://github.com/DSpace
/RestContract/blob/main/csrf-tokens.md
For additional information on how DSpace's CSRF Protection works, see our REST Contract at https://github.com/DSpace/RestContract/blob/main/csrf-
tokens.md
Using a Self-Signed SSL Certificate causes the Frontend to not be able to access the Backend
If you setup the backend to use HTTPS with a self-signed SSL certificate, then Node.js (which the frontend runs on) may not "trust" that certificate by
default. This will result in the Frontend not being able to make requests to the Backend.
One possible workaround (untested as of yet) is to try setting the NODE_EXTRA_CA_CERTS environment variable (which tells Node.js to trust additional
CA certificates).
Another option is to avoid using a self-signed SSL certificate. Instead, create a real, issued SSL certificate using something like Let's Encrypt (or similar
free services)
My REST API is running under HTTPS, but some of its "link" URLs are switching to HTTP
This scenario may occur when you are running the REST API behind an HTTP proxy (e.g. Apache HTTPD's mod_proxy_http, Ngnix's proxy_pass or
any other proxy that is forwarding from HTTPS to HTTP).
The fix is to ensure the DSpace REST API is sent the X-Forwarded-Proto header (by your proxying service), telling it that the forwarded protocol is
HTTPS
X-Forwarded-Proto: https
In general, when running behind a proxy, the DSpace REST API depends on accurate X-Forwarded-* headers to be sent by that proxy.
The fix is to ensure the DSpace User Interface (frontend) is sent the correct X-Forwarded-Proto and Host (or X-Forwarded-Host) headers to tell it the
correct hostname and scheme (HTTP or HTTPS)
ProxyPreserveHost on
RequestHeader set X-Forwarded-Proto https
If you are running DSpace on a Debian-based system (e.g. Ubuntu), some users have reported that it's required grant "ReadWrite" access to Apache
Tomcat (where the backend is running) via the service file (e.g. /lib/systemd/system/tomcat9.service). In the [Service] section you need to add
something like this:
61
# Give Tomcat read/write on the DSpace installation
# Make sure to update the "/PATH/TO" to be the full path of your DSpace install
ReadWritePaths=/PATH/TO/dspace
# Set the "NODE_OPTIONS" environment variable on your system. This example will work for Linux/macOS
# Ensure the "max-old-space-size" is set to 4GB (4096MB) or greater.
export NODE_OPTIONS=--max-old-space-size=4096
NOTE: More discussion on this issue can be found in https://github.com/DSpace/dspace-angular/issues/2259 It appears to only occur on systems where
the default memory allocated for Node isn't sufficient to build DSpace in development mode.
This same setting may also be used in production scenarios to give Node.js more memory to work with. See Performance Tuning DSpace for more details.
Solr responds with "Expected mime type application/octet-stream but got text/html" (404 Not Found)
This error occurs when Solr is either not initialized properly, or your DSpace backend is unable to find/communicate with Solr. Here's a few things you
should double check:
1. Verify that Solr is running and/or check for errors in its logs. Try to restart it (usually via a command like [solr]/bin/solr restart), and
verify it's accessible via wget or a web browser (usually at a URL like http://localhost:8983/solr)
2. Verify that your solr.server setting (in local.cfg) is correct for your Solr installation. This should correspond to the main URL of your Solr site
(usually something like http://localhost:8983/solr). If you use wget or a browser from the machine running your DSpace backend, you
should get a response from that URL (it should return the Solr Admin UI).
3. Verify that the required DSpace Solr cores have been properly installed/configured (per installation instructions above). When properly installed,
you should be able to get a response from them. For example, the URL ${solr.server}/search/select should run an empty query against
the "search" core, returning an empty JSON result.
4. If Solr is running & you are sure solr.server is set properly, double check that nothing else could be blocking the DSpace backend from
accessing Solr. For instance, if Solr is on a separate machine, verify that there is no firewall or proxy that could be blocking access between the
DSpace backend machine and the Solr machine.
it usually means you haven't yet added the relevant configuration parameter to your PostgreSQL configuration (see above), or perhaps you
haven't restarted PostgreSQL after making the change. Also, make sure that the db.username and db.password properties are correctly set in [ds
pace]/config/dspace.cfg. An easy way to check that your DB is working OK over TCP/IP is to try this on the command line:
Enter the dspace database password, and you should be dropped into the psql tool with a dspace=> prompt.
62
Another common error looks like this:
This means that the PostgreSQL JDBC driver is not present in [dspace]/lib. See above.
63
7.0-7.1 Frontend Installation
Frontend (UI) Installation Guide for 7.0 or 7.1 ONLY
In DSpace 7.0 and 7.1, the Frontend installation required configuring your UI before you could build the UI. This required a slightly different installation
process, and this installation process required a full rebuild whenever configurations changed. The configuration file and several scripts also had different
names. This page provides a guide for those who are still on 7.0 or 7.1.
This guide will not work for 7.2!
If you are running DSpace 7.2 (and later), you MUST instead following the new Frontend Installation process documented at Installing DSpace. The new
Frontend Installation instructions are easier and also allow you to rebuild/redeploy with minimal downtime.
Yarn (v1.x)
Yarn v1.x is available at https://classic.yarnpkg.com/. It can usually be install via NPM (or through your Linux distribution's package manager). W
e do NOT currently support Yarn v2.
# You may need to run this command using "sudo" if you don't have proper privileges
npm install --global yarn
PM2 (or another Process Manager for Node.js apps) (optional, but recommended for Production)
In Production scenarios, we highly recommend starting/stopping the User Interface using a Node.js process manager. There are several
available, but our current favorite is PM2. The rest of this installation guide assumes you are using PM2.
PM2 is very easily installed via NPM
# You may need to run this command using "sudo" if you don't have proper privileges
npm install --global pm2
64
3.
4. Create a Production Configuration file at [dspace-angular]/src/environments/environment.prod.ts. You may wish to use the
environment.common.ts as a starting point. This environment.prod.ts file can be used to override any of the default configurations listed in
the environment.common.ts (in that same directory). At a minimum this file MUST include a "rest" section (and may also include a "ui"
section), similar to the following (keep in mind, you only need to include settings that you need to modify).
a. (Optionally) Test the connection to your REST API from the UI from the command-line. This is not required, but it can sometimes help
you discover immediate configuration issues if the test fails.
i. In DSpace 7.1, this could be tested by running yarn config:check:rest This script will attempt a basic Node.js
connection to the REST API that is configured in your "environment.prod.ts" file and validate the response.
ii. A successful connection should return a 200 Response and all JSON validation checks should return "true"
iii. If you receive a connection error or different response code, you MUST fix your REST API before the UI will be able to
work. See also the Commons Installation Issues. If you receive an SSL error, see "Using a Self-Signed SSL Certificate causes
the Frontend to not be able to access the Backend"
b. HINT #1: In the "ui" section above, you may wish to start with "ssl: false" and "port: 4000" just to be certain that everything else is
working properly. With those settings, you can quickly test your UI by running "yarn start" and trying to access it via http://[myds
pace.edu]:4000/ from your web browser. KEEP IN MIND, we highly recommend always using HTTPS for Production.
c. HINT #2: If Node throws an error saying "listen EADDRNOTAVAIL: address not available", try setting the "host" to "0.0.0.0" or
"localhost". Usually that error is a sign that the "host" is not recognized.
d. If there are other settings you know you need to modify in the sample environment.common.ts configuration file you can also copy
them into this same file.
5. Build the User Interface for Production. This uses your environment.common.ts and the source code to create a compiled version of the UI in
the [dspace-angular]/dist folder
a. In 7.1 or 7.0: anytime you change/update your environment.prod.ts, then you will need to rebuild the UI application (i.e. rerun this"
yarn run build:prod" command).
6. Assuming you are using PM2, create a JSON configuration file describing how to run our UI application. This need NOT be in the same directory
as the dspace-angular codebase itself (in fact you may want to put the parent directory or another location). Keep in mind the "cwd" setting (on
line 5) must be the full path to your [dspace-angular] folder.
65
dspace-angular.json
{
"apps": [
{
"name": "dspace-angular",
"cwd": "/home/dspace/dspace-angular",
"script": "yarn",
"args": "run serve:ssr",
"interpreter": "none"
}
]
}
a. Not using PM2? That's OK. The key command that your process manager should run is yarn run serve:ssr. This is the
command that starts the app (after it was built using yarn run build:prod)
b. Using Windows? This "dspace-angular.json" file needs to have a slightly different structure on Windows. First, all paths must include
double backslashes (e.g. C:\\dspace-angular). Second, "cluster" mode is required. Finally, because of a known issue in PM2, you must
point the "script" at the "npm/node_modulesyarn/bin/yarn.js" file directly. So, here's how this configuration looks on Windows platforms:
{
"apps": [
{
"name": "dspace-angular",
"cwd": "C:\\path\\to\\dspace-angular",
"script": "C:\\path\\to\\npm\\node_modules\\yarn\\bin\\yarn.js",
"args": "run serve:ssr",
"interpreter": "none",
"exec_mode": "cluster"
}
]
}
7. Now, start the application using PM2 using the configuration file you created in the previous step
66
iv.
<VirtualHost _default_:443>
.. setup your host how you want, including log settings...
SSLEngine on
SSLCertificateFile [full-path-to-PEM-cert]
SSLCertificateKeyFile [full-path-to-cert-KEY]
b. (Alternatively) You can use the basic HTTPS support built into dspace-angular node server. (This may currently be better for non-
Production environments as it has not been well tested)
i. Create a [dspace-angular]/config/ssl/ folder and add a key.pem and cert.pem to that folder (they must have those
exact names)
ii. Enable "ui.ssl" (set to true)
iii. Update your "ui.port" to be 443
1. In order to run Node/PM2 on port 443, you also will likely need to provide node with special permissions, like in this
example.
iv. Rebuild and then restart the app in PM2
v. Keep in mind, while this setup is simple, you may not have the same level of detailed, Production logs as you would with
Apache HTTPD or Nginx
10. Additional UI configurations are described in User Interface Configuration. A guide to customizing the look and feel or branding via a Theme is
also available in User Interface Customization
67
Upgrading DSpace
What versions does this guide cover?
These instructions are valid for any of the following upgrade paths:
Upgrading ANY prior version (1.x.x, 3.x, 4.x, 5.x, 6.x or 7.x) of DSpace to DSpace 7.x (latest version)
For more information about new features or major changes in previous releases of DSpace, please refer to following:
Releases - Provides links to release notes for all prior releases of DSpace
Version History - Provides detailed listing of all changes in all prior releases of DSpace
Upgrading to the latest release (covered by this guide). This provides a walkthrough of how to install the latest version of the code over top of
your existing DSpace installation, in order to upgrade to the latest version.
Migrating to the latest version. This separate guide provides a walkthrough of installing a new copy of DSpace and migrating your existing
production data into it. This approach may be more useful if you wish to move your DSpace to a different server, or want to start "fresh" with the
same data.
The approach you choose is up to you. Upgrading is often easiest for minor upgrades (e.g. 7.x latest 7.x). Migrating may be useful for major upgrades (e.
g. 7.x 8.x), or if you need to move your DSpace installation.
Please refrain from customizing the DSpace database tables. It will complicate your next upgrade!
As DSpace automatically upgrades your database structure (using FlywayDB migrations), we highly recommend AGAINST customizing the DSpace
database tables/structure or backporting any features that change the DSpace tables/structure. Doing so will often cause the automated database upgrade
process to fail (and therefore will complicate your next upgrade).
If you must add features requiring new database tables/structure, we recommend creating new tables (instead of modifying existing ones), as that is
usually much less disruptive to our automated database upgrade.
Test Your Upgrade Process
In order to minimize downtime, it is always recommended to first perform a DSpace upgrade using a Development or Test server. You should note any
problems you may have encountered (and also how to resolve them) before attempting to upgrade your Production server. It also gives you a chance to
"practice" at the upgrade. Practice makes perfect, and minimizes problems and downtime. Additionally, if you are using a version control system, such as
git, to manage your locally developed features or modifications, then you can do all of your upgrades in your local version control system on your
Development server and commit the changes. That way your Production server can checkout your well tested and upgraded code.
In the notes below [dspace] refers to the install directory for your existing DSpace installation, and [dspace-source] to the source directory for
DSpace. Whenever you see these path references, be sure to replace them with the actual path names on your local system.
Git is no longer required to build DSpace. You can remove Git from your system unless you require Git for your own local version control.
Hierarchical browse indexes for controlled vocabularies can now be disabled via a backend configuration. See Configuration
Reference#HierarchicalBrowseIndexes for details.
DSpace 7.6 added some new features and many improvements and bug fixes. See the Release Notes for all the details. These major features are most
likely to impact your upgrade:
Oracle support has been removed as was previously announced in March 2022 on our mailing lists.
Item counts can now be displayed for all Communities/Collections similar to version 6.x.
New default Privacy Statement and End User Agreement . The new default text of these policies can be found by visiting the links in the footer
of our demo site.
DSpace 7.5 added some new features and many improvements and bug fixes. See the Release Notes for all the details. These major features are most
likely to impact your upgrade:
68
Subscribe to email updates from a Community or Collection - This REQUIRES enabling a new "subscription-send" Scheduled Tasks via
Cron.
New "dspace database skip" command - Useful tool for some sites upgrading from 6.x or below if they encounter this common migration error
User interface now provides page caching for better performance - Sites which have encountered slower UI performance may wish to use
the new settings in Cache Settings - Server Side Rendering(SSR)
User interface now requires Node.js 16 or 18. While it may work for other versions, it's best to no longer use Node.js 14, as it goes EOL in
April 2023
DSpace 7.4 added some new features and many improvements and bug fixes. See the Release Notes for all the details. These major features are most
likely to impact your upgrade:
Recent Submissions are now listed on the homepage. (See "recentSubmissions" options)
Thumbnails are now displayed in all search/browse screens. (See "showThumbnails" option)
DSpace 7.3 added many new features and improvements. See the Release Notes for all the details. These major features are most likely to impact your
upgrade:
Oracle Database support has been deprecated. It will be removed in mid-2023 (likely in the 7.6 release, which is tentative for June 2023). All
sites should plan to migrate to PostgreSQL. See https://github.com/DSpace/DSpace/issues/8214
ORCID Authentication and synchronization to a DSpace Researcher Profile now exists (requires first enabling Configurable Entities). See O
RCID Integration and Researcher Profiles
Admin "Health" menu provides basic control panel functionality (based on 6.x Control Panel). When logged in as an Administrator, select
"Health" from the side menu. You'll see a "Status" tab which provides useful information about the status of the DSpace backend, and an "Info"
tab which provides an overview of backend configurations and Java information.
Media Filter updates to use Apache Tika. There is now a single "Text Extractor" media filter which you should use (see updated settings in
dspace.cfg and Mediafilters for Transforming DSpace Content for more details).
DSpace 7.2 made some major changes to the User Interface build process, including a new configuration file format. See the Release Notes for all the
details. These major features are most likely to impact your upgrade:
New Configuration for User Interface to support Runtime Configuration: In the User Interface, the "environment.*.ts" configuration files have
been replaced with a new "config.*.yml" file. A migration script is provided which can migrate your UI configurations from the old format to the
new one. More information on that migration is available in the User Interface Configuration documentation.
Submission Process now supports Item Embargoes / access restrictions. It is disabled by default, but can be easily enabled by
uncommenting (or adding) the "itemAccessConditions" step in your item-submission.xml on the backend. See Submission User Interface and Emb
argo for more details.
Feedback Form now exists. It is enabled by default in the UI's footer as long as you set a "feedback.recipient" in your local.cfg on the backend.
OpenID Connect (OIDC) Authentication Plugin now exists. See the Authentication Plugins page for how to enable it.
Improved support for custom "Browse By" configurations. If you had previously configured custom "Browse by" types in your UI
configuration file, those settings can be removed. The "Browse by" types are now read dynamically from the REST API based on configured
indexes. See User Interface Configuration for more details.
DSpace 7.1 primarily added new features and bug fixes on top of 7.0. See the Release Notes for all the details. A few key changes to be aware of:
New Collection "Entity Type" Configuration: If you were using Configurable Entities in 7.0, the Entity Type to Collection "mapping" has been
moved to the Collection's "Edit Metadata" screen as a new "Entity Type" dropdown. This formalizes the recommended mapping between a
Collection and an Entity Type (so we highly recommend each Entity Type be stored in its own Collection). See the "Configure Collections for
each Entity type" section of the Configurable Entities documentation for more details.
DSpace 7.0 features some significant changes which you may wish to be aware of before beginning your upgrade:
XMLUI and JSPUI are no longer supported or distributed with DSpace. All users should install and utilize the new Angular User Interface.
See the "Installing the Frontend (User Interface)" instructions in Installing DSpace
The old REST API ("rest" webapp from DSpace v4.x-6.x) is deprecated and will be removed in v8.x. The new REST API (provided in the
"server" webapp) replaces all functionality available in the older REST API. If you have tools that rely on the old REST API, you can still
(optionally) build & deploy it alongside the "server" webapp via the "-Pdspace-rest" Maven flag. See REST API v6 (deprecated)
Solr must be installed separately due to changes in the packaging of recent Solr releases. The indexes have been reconfigured and must be
rebuilt. See below.
GeoIP location database must be installed separately due to changes in Maxmind's terms and conditions. MaxMind has changed the terms
and procedure for obtaining and using its GeoLite2 location database. Consequently, DSpace no longer automatically downloads the database
during installation or update, and the DSpace-specific database update tool has been removed. If you wish to (continue to) record client location
data in SOLR Statistics, you will need to make new arrangements. See below.
The Submission Form configuration has changed. The "item-submission.xml" file has changed its structure, and the "input-forms.xml" has
been replaced by a "submission-forms.xml". See Submission User Interface
The traditional, 3-step Workflow system has been removed in favor of the Configurable Workflow System . For most users, you should
see no effect or difference. The default setup for this Configurable Workflow System is identical to the traditional, 3-step workflow ("Approve
/Reject", "Approve/Reject/Edit Metadata", "Edit Metadata")
The old BTE import framework in favor of Live Import Framework (features of BTE have been ported to Live Import)
ElasticSearch Usage Statistics have been removed. Please use SOLR Statistics or DSpace Google Analytics Statistics.
Configuration has been upgraded to Apache Commons Configuration version 2. For most users, you should see no effect or difference. No
DSpace configuration files were modified during this upgrade and no configurations or settings were renamed or changed. However, if you locally
modified or customized the [dspace]/config/config-definition.xml (DSpace's Apache Commons Configuration settings), you will need
to ensure those modifications are compatible with Apache Commons Configuration version 2. See the Apache Commons Configuration's configur
ation definition file reference for more details.
Handle server has been updated to v9.3. Most users will see no effect or difference, however a minor change to the Handle Server
configuration is necessary, see below.
Many backend prerequisites have been upgraded to avoid "end of life" versions. Therefore, pay very close attention to the required
prerequisites listed below.
69
A large number of old/obsolete configurations were removed. "7.0 Configurations Removed" section in the Release Notes.
Database: Make a snapshot/dump of the database. For the PostgreSQL database use Postgres' pg_dump command. For example:
Assetstore: Backup the directory ([dspace]/assetstore by default, and any other assetstores configured in [dspace]/config/spring
/api/bitstore.xml)
Configuration: Backup the entire directory content of [dspace]/config.
Customizations: If you have custom code, such as themes, modifications, or custom scripts, you will want to back them up to a safe location.
Statistics data: what to back up depends on what you were using before: the options are the default SOLR Statistics, or the legacy statistics.
Legacy stats utilizes the dspace.log files, while SOLR Statistics stores data in [dspace]/solr/statistics. A simple copy of the logs or the
Solr core directory tree should give you a point of recovery, should something go wrong in the update process. We can't stress this enough: your
users depend on these statistics more than you realize. You need a backup.
Authority data: stored in [dspace]/solr/authority. As with the statistics data, making a copy of the directory tree should enable recovery
from errors.
Refer to the Backend Requirements section of "Installing DSpace" for more details around configuring and installing these prerequisites.
If during the upgrade you are migrating your DSpace backend to a new server/machine, see Migrating DSpace to a new server guide for hints/tips.
1. Download the latest DSpace release from the DSpace GitHub Repository. You can choose to either download the zip or tar.gz file provided by
GitHub, or you can use "git" to checkout the appropriate tag (e.g. dspace-7.2) or branch.
a. Unpack it using "unzip" or "gunzip". If you have an older version of DSpace installed on this same server, you may wish to unpack it to a
different location than that release. This will ensure no files are accidentally overwritten during the unpacking process, and allow you to
compare configs side by side.
b. For ease of reference, we will refer to the location of this unzipped version of the DSpace release as [dspace-source] in the remainder of
these instructions.
2. If upgrading from 6.x or below, a few extra steps are required before you install DSpace 7.x. If you are upgrading from a previous version
of 7.x, skip this and move along.
a. Ensure that your database is compatible: Starting with DSpace 6.x, there are new database requirements for DSpace (refer to the Bac
kend Requirements section of "Installing DSpace" for full details).
i. PostgreSQL databases: PostgreSQL 9.4 or above is required and the "pgcrypto" extension must be installed.
1. Notes on installing pgcrypto
a. On most Linux operating systems (Ubuntu, Debian, RedHat), this extension is provided in the "postgresql-
contrib" package in your package manager. So, ensure you've installed "postgresql-contrib".
b. On Windows, this extension should be provided automatically by the installer (check your "[PostgreSQL]/share
/extension" folder for files starting with "pgcrypto")
2. Enabling pgcrypto on your DSpace database. (Additional options/notes in the Installation Documentation)
70
# Login to your "dspace" database as a superuser
psql --username=postgres dspace
# Enable the pgcrypto extension on this database
CREATE EXTENSION pgcrypto;
ii. Oracle databases: Oracle support has been deprecated in DSpace. It will no longer be supported as of June/July 2023. See htt
ps://github.com/DSpace/DSpace/issues/8214 for more details.
b. From your old version of DSpace, dump your authority and statistics Solr cores. (Only necessary if you want to keep both
your authority records and/or SOLR Statistics)
The dumps will be written to the directory [dspace]/solr-export. This may take a long time and require quite a lot of storage. In
particular, the statistics core is likely to be huge, perhaps double the size of the content of solr/statistics/data. You should
ensure that you have sufficient free storage.
This is not the same as the disaster-recovery backup that was done above. These dumps will be reloaded into new, reconfigured cores l
ater.
If you were sharding your statistics data, you will need to dump each shard separately. The index names for prior years will be statist
ics-YYYY (for example: statistics-2017 statistics-2018 etc.) The current year's statistics shard is named statistics and
you should dump that one too.
Unfortunately, the "solr-export-statistics" script was not created until DSpace 5.x. Therefore, you will not be able to upgrade statistics
from 4.x or below unless you first upgrade to either 5.x or 6.x. This upgrade could be done in a test environment, to allow you to export
your statistics (so they can be reimported into 7.x below). But, there's unfortunately no direct way to migrate 4.x (or 3.x or 1.x.x) Solr
Statistics into 7.x.
c. Move your old Solr cores to a safe location in case of trouble with the upgrade procedure. If you leave them in place, you will get a
mixture of old and new files that the new Solr will refuse to load.
d. (If upgrading from 5.x or below) Replace your old build.properties file with a local.cfg : As of DSpace 6.0, the build.properties
configuration file has been replaced by an enhanced local.cfg configuration file. Therefore, any old build.properties file (or
similar [dspace-source]/*.properties files) WILL BE IGNORED. Instead, you should create a new local.cfg file, based on the
provided [dspace-source]/dspace/config/local.cfg.EXAMPLE and use it to specify all of your locally customized DSpace
configurations. This new local.cfg can be used to override ANY setting in any other configuration file (dspace.cfg or modules/*.
cfg). To override a default setting, simply copy the configuration into your local.cfg and change its value(s). For much more
information on the features of local.cfg, see the Configuration Reference documentation and the local.cfg Configuration File section on
that page.
cd [dspace-source]
cp dspace/config/local.cfg.EXAMPLE local.cfg
# Then edit the local.cfg, specifying (at a minimum) your basic DSpace configuration settings.
# Optionally, you may copy any settings from other *.cfg configuration files into your local.cfg
to override them.
# After building DSpace, this local.cfg will be copied to [dspace]/config/local.cfg, where it will
also be used at runtime.
cd [dspace-source]
mvn -U clean package
The above command will re-compile the DSpace source code and build its "installer". You will find the result in [dspace-source]/dspace
/target/dspace-installer
Without any extra arguments, the DSpace installation package is initialized for PostgreSQL. If you use Oracle instead, you should build the
DSpace installation package as follows:
mvn -Ddb.name=oracle -U clean package
4. Stop Tomcat (or servlet container). Take down your servlet container.
a. For Tomcat, use the $CATALINA_HOME/shutdown.sh script. (Many Unix-based installations will have a startup/shutdown script in the /
etc/init.d or /etc/rc.d directories.)
5. Update your DSpace Configurations. Depending on the version of DSpace you are upgrading from, not all steps are required.
a. If you are upgrading from a prior version of DSpace 7.x, you will need to perform the following steps.
i.
71
5.
a.
db.dialect = org.hibernate.dialect.PostgreSQL94Dialect
ii. You may wish to review the Release Notes for details about new features. There may be new configurations you may wish to
tweak to enable/disable those features.
iii. Make sure your existing 7.x local.cfg is in the source directory (e.g. [dspace-source]/dspace/config/local.cfg). That
way your existing 7.x configuration gets reinstalled alongside the new version of DSpace.
b. If you are upgrading from DSpace 6.x or below, you will need to perform these steps.
i. Review your customized configurations (recommended to be in local.cfg): As mentioned above, we recommend any local
configuration changes be placed in a local.cfg Configuration File. With any major upgrade some configurations may have
changed. Therefore, it is recommended to review all configuration changes that exist in the config directory, and its
subdirectories, concentrating on configurations your previously customized in your local.cfg. See also the Configuration
Reference.
ii. Remove obsolete configurations. With the removal of the JSPUI and XMLUI, a large number of server-side (backend)
configurations were made obsolete and were therefore removed between the 6.x and 7.0 release. A full list can be found in the
Release Notes.
iii. Remove BTE Spring configuration: If it exists, remove the [dspace]/config/spring/api/bte.xml Spring
Configuration. This file is no longer needed as the BTE framework was removed in favor of Live Import from external sources.
iv. M igrate or recreate your Submission configuration. As of DSpace 7, the submission configuration has changed. The
format of the "item-submission.xml" file has been updated, and the older "input-forms.xml" has been replaced by a new
"submission-forms.xml". You can choose to either start fresh with the new v7 configuration files, or you can use the steps
below to migrate your old configurations into the new format. See the Submission User Interface for more information
1. First, create a temporary folder to copy your old v6 configurations into
2. Copy your old (v5 or v6) "item-submission.xml" and "input-forms.xml" into that temporary folder
3. Run the command-line migration script to migrate them to v7 configuration files
4. The result will be two files. These are valid v7 configurations based on your original submission configuration files.
a. [dspace]/config/item-submission.xml.migrated
b. [dspace]/config/submission-forms.xml.migrated
5. These "*.migrated" files have no inline comments, so you may want to edit them further before installing them (by
removing the ".migrated" suffix). Alternatively, you may choose to copy sections of the *.migrated files into the default
configurations in the [dspace]/config/ folder, therefore retaining the inline comments in those default files.
v. City IP Database file for Solr Statistics has been renamed. The old [dspace]/config/GeoLiteCity.dat file is no
longer maintained by its provider. You can delete it. The new file is named GeoLite2-City.mmdb by default. If you have
configured a different name and/or location for this file, you should check the setting of usage-statistics.dbfile in [dspa
ce]/config/modules/usage-statistics.cfg (and perhaps move your custom setting to local.cfg).
vi. tm-extractors media filtering (WordFilter) no longer exists: the PoiWordFilter plugin now fulfills this function. If you still
have WordFilter configured, remove from dspace.cfg and/or local.cfg all lines referencing org.dspace.app.
mediafilter.WordFilter and uncomment all lines referencing org.dspace.app.mediafilter.PoiWordFilter.
vii. Re-configure Solr URLs: change the value of solr.server to point at your new Solr external service. It will probably
become something like solr.server = https://localhost:8983/solr. Solr only needs to be accessible to the
DSpace backend, and should not be publicly available on the web. It can either be run on localhost or via a hostname (if run on
a separate server from the backend). Also review the values of
1. discovery.search.server
2. oai.solr.url
3. solr.authority.server
4. solr-statistics.server
viii. Sitemaps are now automatically generated/updated: A new sitemap.cron setting exists in the dspace.cfg which controls
when Sitemaps are generated. By default they are enabled to update once per day, for optimal SEO. See Search Engine
Optimization docs for more detail
1. Because of this change, if you had a system cron job which ran "./dspace generate-sitemaps", this system cron
job can be removed in favor of the new sitemap.cron setting.
c. If you are upgrading from DSpace 5.x or below, there are a few additional configuration changes to be aware of.
i. Search/Browse requires Discovery: As of DSpace 6, only Discovery (Apache Solr) is supported for search/browse. Support
for Legacy Search (using Apache Lucene) and Legacy Browse (using database tables) has been removed, along with all their
configurations.
72
ii. XPDF media filtering no longer exists: XPDF media filtering, deprecated in DSpace 5, has been removed. If you used this,
you will need to reconfigure using the remaining alternatives (e.g. PDF Text Extractor and/or ImageMagick PDF Thumbnail
Generator).
6. Update DSpace Installation. Update the DSpace installation directory with the new code and libraries. Issue the following commands:
cd [dspace-source]/dspace/target/dspace-installer
ant update
7. Upgrade your database (required for all upgrades). The DSpace code will automatically upgrade your database (from any prior version of
DSpace). By default, this database upgrade occurs automatically when you restart Tomcat (or your servlet container). However, if you have a
large repository or are upgrading across multiple versions of DSpace at once, you may wish to manually perform the upgrade (as it could take
some time, anywhere from 5-15 minutes for large sites).
a. (Optional) If desired, you can optionally verify which migrations have not yet been run on your database. You can use this to double
check that DSpace is recognizing your database version appropriately
# If you are upgrading from 5.x or later, then this will list all migrations
# which were previously run, along with any which are "PENDING" or "IGNORED"
# that need to be run to upgrade your database.
# If you are upgrading from 4.x or earlier, this will attempt to detect which
# version of DSpace you are upgrading from. Look for a line at the bottom
# that says something like:
# "Your database looks to be compatible with DSpace version ___"
b. (Optional) In some rare scenarios, if your database's "sequences" are outdated, inconsistent or incorrect, a database migration error may
occur (in your DSpace logs). While this is seemingly a rare occurrence, you may choose to run the "update-sequences" command
PRIOR to upgrading your database. If your database sequences are inconsistent or incorrect, this "update-sequences" command will
auto-correct them (otherwise, it will do nothing).
# If upgrading from DSpace 6 or below, this script had to be run via psql from [dspace]/etc
/postgres/update-sequences.sql
# For example:
# psql -U [database-user] -f [dspace]/etc/postgres/update-sequences.sql [database-name]
# NOTE: It is important to run the "update-sequences" script which came with the OLDER version of
DSpace (the version you are upgrading from)! If you've misplaced # this older version of the
script, you can download it from our codebase & run it via the "psql" command above.
# DSpace 6.x version of "update-sequences.sql": https://github.com/DSpace/DSpace/blob/dspace-6_x
/dspace/etc/postgres/update-sequences.sql
# DSpace 5.x version of "update-sequences.sql": https://github.com/DSpace/DSpace/blob/dspace-5_x
/dspace/etc/postgres/update-sequences.sql
c. (REQUIRED) Then, you can upgrade your DSpace database to the latest version of DSpace. (NOTE: check the DSpace log, [dspace]
/log/dspace.log.[date], for any output from this command)
If you are upgrading from DSpace 6.x or below be sure you include the "ignored" parameter! There are database changes which
were previously optional but now are mandatory (specifically Configurable Workflow database changes).
d. If the database upgrade process fails or throws errors, then look at the "Troubleshooting Upgrade Issues" section below for possible tips
/hints.
e. More information on the "database" command can be found in Database Utilities documentation.
By default, your site will be automatically reindexed after a database upgrade
If any database migrations are run (even during minor release upgrades), then by default DSpace will automatically reindex all content in your
site. This process is run automatically in order to ensure that any database-level changes are also immediately updated within the search/browse
interfaces. See the notes below under "Restart Tomcat (servlet container)" for more information.
However, you may choose to skip automatic reindexing. Some sites choose to run the reindex process manually in order to better control when
/how it runs.
73
To disable automatic reindexing, set discovery.autoReindex = false in config/local.cfg or config/modules/discovery.cfg.
As you have disabled automatic reindexing, make sure to manually reindex your site by running [dspace]/bin/dspace index-discovery -
b (This must be run after restarting Tomcat)
WARNING: It is not recommended to skip automatic reindexing, unless you will manually reindex at a later time, or have verified that a reindex is
not necessary. Forgetting to reindex your site after an upgrade may result in unexpected errors or instabilties.
8. Deploy Server web application: The DSpace backend consists of a single "server" webapp (in [dspace]/webapps/server ). You need to
deploy this webapp into your Servlet Container (e.g. Tomcat). Generally, there are two options (or techniques) which you could use...either
configure Tomcat to find the DSpace "server" webapp, or copy the "server" webapp into Tomcat's own webapps folder. For more information &
example commands, see the Installation Guide
a. Optionally, you may also install the deprecated DSpace 6.x REST API web application ("rest" webapp). If you previously used the DSpac
e 6.x REST API, for backwards compatibility the old, deprecated "rest" webapp is still available to install (in [dspace]/webapps/rest)
. It is NOT used by the DSpace UI/frontend. So, most users should skip this step.
9. If upgrading from a previous version of 7.x, a few extra steps may be required before starting Tomcat.
a. Update your Solr schema definition(s)
i. If you are upgrading from one 7.x release to another, you will need to update your 'search' Solr schema definition with the new
version (For example, in 7.1 a new "search.entitytype" field was added to this schema. In 7.6, a new "lastModified_dt" field was
added to this schema.).
2. Restart Solr
[solr]/bin/solr restart
[dspace]/bin/dspace index-discovery -b
10. If upgrading from 6.x or below, a few extra steps are required to before starting Tomcat. If you are upgrading from a previous version of 7.x,
skip this and move along.
a. Install new Solr cores and rebuild your indexes. (Required when upgrading from 6.x or below. This may be done after starting
Tomcat, but is required for DSpace 7.x to function properly.)
i. Copy the new, empty Solr cores to your new Solr instance.
cp -R [dspace]/solr/* [solr]/server/solr/configsets
chown -R solr:solr [solr]/server/solr/configsets
ii. Start Solr, or restart it if it is running, so that these new cores are loaded.
[solr]/bin/solr restart
iii. You can check the status of Solr and your new DSpace cores by using its administrative web interface. Browse to ${solr.
server} (e.g. http://localhost:8983/solr/) to see if Solr is running well, then look at the cores by selecting (on the
left) Core Admin or using the Core Selector drop list.
1. For example, to test that your "search" core is setup properly, try accessing the URL ${solr.server}/search
/select. It should run an empty query against the "search" core, returning an empty JSON result. If it returns an
error, then that means your "search" core is missing or not installed properly.
iv. Load authority and statistics from the CSV dumps that you made earlier in Step 2 above.
If you had sharded your statistics, you will need to load the dump of each shard separately into the "statistics" core. DSpace 7
does not support Solr shards at this time. Unfortunately, this will involve renaming all CSV export files to remove the year (e.g.
rename "statistics-2012_export_2013-12_5.csv" to "statistics_export_2013-12_5.csv") and rerunning "[dspace]/bin/dspace solr-
import-statistics -i statistics". More advice on this process can be found in this dspace-tech mailing list thread.
v.
74
v. For the Statistics core(s) only, upgrade legacy DSpace Object Identifiers (pre-6.4 statistics) to UUID Identifiers.
Again If you had sharded your statistics, you will need to run this for each shard separately. See also SOLR Statistics
Maintenance#UpgradeLegacyDSpaceObjectIdentifiers(pre-6xstatistics)toDSpace6xUUIDIdentifiers
vi. Rebuild the oai and search cores.
If you have a great deal of content, this could take a long time.
b. Update Handle Server Configuration. (Required when upgrading from 6.x or below) Because we've updated to Handle Server v9, if
you are using the built-in Handle server (most installations do), you'll need to add the follow to the end of the server_config section of
your [dspace]/handle-server/config.dct file (the only new line is the "enable_txn_queue" line)
"case_sensitive" = "no"
"storage_type" = "CUSTOM"
"storage_class" = "org.dspace.handle.HandlePlugin"
"enable_txn_queue" = "no"
i. Alternatively, you could re-run the ./dspace make-handle-config script, which is in charge of updating this config.dct
file.
c. (Optional) Set up IP to City database for location-based statistics. If you wish to (continue to) record the geographic origin of client
activity, you will need to install (and regularly update) one of the following:
i. Either, a copy of MaxMind's GeoLite City database (in MMDB format)
NOTE: Installing MaxMind GeoLite2 is free. However, you must sign up for a (free) MaxMind account in order to
obtain a license key to use the GeoLite2 database.
You may download GeoLite2 directly from MaxMind, or many Linux distributions provide the geoipupdate tool
directly via their package manager. You will still need to configure your license key prior to usage.
Once the "GeoLite2-City.mmdb" database file is installed on your system, you will need to configure its location as the
value of usage-statistics.dbfile in your local.cfg configuration file .
You can discard any old GeoLiteCity.dat database(s) found in the config/ directory (if they exist).
See the "Managing the City Database File" section of SOLR Statistics for more information about using a City
Database with DSpace.
ii. Or, you can alternatively use/install DB-IP's City Lite database (in MMDB format)
This database is also free to use, but does not require an account to download.
Once the "dbip-city-lite.mmdb" database file is installed on your system, you will need to configure its location as the
value of usage-statistics.dbfile in your local.cfg configuration file .
See the "Managing the City Database File" section of SOLR Statistics for more information about using a City
Database with DSpace.
d. Check your cron / Task Scheduler jobs. In recent versions of DSpace, some of the scripts names have changed.
i. Check the Scheduled Tasks via Cron documentation for details. If you have been using the dspace stats-util --
optimize tool, it is no longer recommended and you should stop.
ii. WINDOWS NOTE: If you are running the Handle Server on a Windows machine, a new [dspace]/bin/start-handle-
server.bat script is available to more easily startup your Handle Server.
11. Restart Tomcat (servlet container). Now restart your servlet container (Tomcat/Jetty/Resin) and test out the upgrade.
a. Upgrade of database: If you didn't manually upgrade your database in the previous step, then your database will be automatically
upgraded to the latest version. This may take some time (seconds to minutes), depending on the size of your repository, etc. Check the
DSpace log ([dspace]/log/dspace.log.[date]) for information on its status.
12. Reindexing of all content for search/browse: If your database was just upgraded (either manually or automatically), all the content in your
DSpace will be automatically re-indexed for searching/browsing. As the process can take some time (minutes to hours, depending on the size of
your repository), it is performed in the background; meanwhile, DSpace can be used as the index is gradually filled. But, keep in mind that not all
content will be visible until the indexing process is completed. Again, check the DSpace log ( [dspace]/log/dspace.log.[date]) for
information on its status.
a. If you wish to skip automatic reindexing, please see the Note above under the "Upgrade your Database" step.
b. When upgrading from 7.0/7.1/7.2 to 7.3, it is REQUIRED to reindex your content. If reindexing does not occur automatically, or you
disabled it, then run "./dspace index-discovery -b" to reindex your site.
13. Review / Update your scheduled tasks (e.g. cron jobs). For all features of DSpace to work properly, there are some scheduled tasks you
MUST setup to run on a regular basis. Some examples are tasks that help create thumbnails (for images), do full-text indexing (of textual content)
and send out subscription emails. See the Scheduled Tasks via Cron for more details.
a. When upgrading to 7.5 (or later) , you will want to make sure the new "subscription-send" task is added to your existing scheduled tasks
(in cron or similar). This new task is in charge of sending Email Subscriptions for any users who have subscribed to updates. (NOTE:
"subscription-send" replaces the older "sub-daily" task from 6.x or below). See the Scheduled Tasks via Cron for more details.
14.
75
14. Install or Upgrade the new User Interface (see below)
d. Build the latest User Interface code for Production: This rebuilds the latest code into the [dspace-angular]/dist directory
yarn build:prod
e. If upgrading from 7.0 or 7.1, read the updated Installation documentation. As of 7.2, we now recommend deploying the compiled
User Interface (in [dspace-angular]/dist) to a different directory (which we refer to as [dspace-ui-deploy]) in order to keep
your running UI separate from the source code. While it's still possible to run the UI using "yarn start" or "yarn run serve:ssr" (both of
which use [dspace-angular]/dist), that older approach will mean that your site goes down / becomes unavailable anytime you
rebuild (yarn build:prod). To solve this issue:
i. Create a separate [dspace-ui-deploy] location as described in the Installation documentation.
ii. Copy the [dspace-angular]/dist folder to that location, as described in the Installation documentation.
iii. Update your PM2 configuration or local startup scripts to use Node.js instead of Yarn. Again, see the Installation
documentation.
f. If upgrading from 7.0 or 7.1, migrate your UI Configurations to YAML. In 7.2, the format of the UI configuration file changed from
Typescript to YAML to support runtime configuration. This means that the older ./src/environment/environment.*.ts
configuration files have all been replaced by corresponding ./config/config.*.yml configuration files (e.g. environment.prod.ts
config.prod.yml).
i. Either, manually move your "environment.prod.ts" configurations into a new "./config/config.prod.yml" file, using the "./config
/config.example.yml" as a guide, along with the User Interface Configuration documentation.
ii. OR, you can migrate your configurations using the provided "yarn env:yaml" migration script. For detailed instructions, see the
Migrate environment file to YAML second of the User Interface Configuration documentation.
g. (Optional) Review Configuration changes to see if you wish to update any new configurations
i. In 7.1, we added the ability to "extend" themes in the "themes" section. See the User Interface Configuration documentation for
details.
ii. In 7.2, themes now support optional "headTags" which can be used to customize favicons per theme (see User Interface
Customization). Additionally, "browseBy > types" configurations were removed, as they are now dynamically retrieved from the
REST API (see User Interface Configuration).
iii. In 7.3, you can optionally enable Item Access labels in the UI to display the status of an Item as a badge (e.g. "open access",
"restricted", "metadata only" or "embargoed"). See User Interface Configuration
iv. In 7.4, many new user interface configurations were added. See Release Notes and User Interface Configuration for details.
v. In 7.5, a new "Server Side Rendering" page caching option was added which can drastically speed up the initial response of
your site. Other new settings also were added. See Release Notes and User Interface Configuration for details.
h. Update your theme (if necessary), if you've created a custom theme in "src/themes" (or modified the existing "custom" or "dspace"
themes in that location). Pay close attention to the following...
i. In 7.3, a new "eager-theme.module.ts" and "lazy-theme.module.ts" has been added to both the "custom" and "dspace" themes
to improve performance. Make sure to copy those to your custom theme. Additionally, this new "eager-theme.module.ts" for
your theme MUST be imported/enabled in "src/themes/eager-themes.module.ts". For example, for a local theme under "src
/theme/my-theme":
src/themes/eager-themes.module.ts
76
MyThemeEagerThemeModule,
],
})
ii. Additional minor changes may have been made. It's usually best to look for changes to whichever theme you started from. If
you started your theme from the "custom" theme, look for any new changes made under "/src/themes/custom". If you started
your theme from the "dspace" theme, look for any new changes made under "/src/themes/dspace".
1. Using a tool like "git diff" from the commandline is often an easy way to see changes that occurred only in that
directory.
# Example which will show all the changes to "src/themes/dspace" (and all subfolders)
# between dspace-7.4 (tag) and dspace-7.5 (tag)
git diff dspace-7.4 dspace-7.5 -- src/themes/dspace/
iii. For the "custom" theme, the largest changes are often:
1. New themeable components (subdirectories) may be added under "src/themes/custom/app", allowing you the ability to
now change the look & feel of those components.
2. The "src/themes/custom/theme.module.ts" file will likely have minor updates. This file registers any new themeable
components (in the "const DECLARATIONS" section), and also registers new Modules, i.e. new UI features, (in the "@
NgModule" "imports" section). Make sure those sections are updated in your copy of this file!
3. Sometimes, new styles may be added in the "styles" folder, or new imports to "styles/theme.scss"
4. If you have locally customized the styles or look & feel of any component, you should also verify that the component
itself (in src/app) hasn't had updates.
iv. For the "dspace" theme, the largest changes are often:
1. Existing customized components (subdirectories) under "src/themes/dspace/app/" may have minor updates, if
improvements were made to that component.
2. The "src/themes/custom/theme.module.ts" file will likely have minor updates. This file registers any new themeable
components (in the "const DECLARATIONS" section), and also registers new Modules, i.e. new UI features, (in the "@
NgModule" "imports" section). Make sure those sections are updated in your copy of this file!
3. Sometimes, new styles may be added in the "styles" folder, or new imports to "styles/theme.scss"
4. If you have locally customized the styles or look & feel of any additional component, you should also verify that the
component itself (in src/app) hasn't had updates.
i. Restart the User Interface.
i. If you are using PM2 as described in the Installing DSpace instructions, you'd stop it and then start it back up as follows
# If you had to update your PM2 configs, you may need to delete your old configuration from
PM2
# pm2 delete dspace-ui.json
# Start it back up
pm2 start dspace-ui.json
ii. If you are using a different approach, you simply need to stop the running UI, and re-run:
77
Database migrate errors: "Migration V5.7_2017.04.11__DS-
3563_Index_metadatavalue_resource_type_id_column.sql failed" or " Migration V5.7_2017.05.05__DS-
3431.sql failed"
If you are upgrading to DSpace 7.x and receive either of the two following errors after running "./dspace database migrate ignored":
[or]
This means your database never ran those older migrations during a past upgrade from 5.x6.x (or similar).
Luckily, though, these migrations are both obsolete in DSpace 7.x (and later). This means you can skip these migration safely.
As of DSpace 7.5, a new "./dspace database skip" command is provided to easily skip one (or both) of these failing migrations as follows:
For more information on the "./dspace database skip" command see Database Utilities.
cd [dspace]/bin/
./dspace registry-loader -metadata ../config/registries/dcterms-types.xml
./dspace registry-loader -metadata ../config/registries/dublin-core-types.xml
./dspace registry-loader -metadata ../config/registries/eperson-types.xml
./dspace registry-loader -metadata ../config/registries/local-types.xml
./dspace registry-loader -metadata ../config/registries/sword-metadata.xml
./dspace registry-loader -metadata ../config/registries/workflow-types.xml
78
Migrating DSpace to a new server
These instructions are meant as a general guideline to how you can migrate your DSpace site/data to a new server while also Upgrading DSpace to the
latest release. Keep in mind that you MUST also review the Installing DSpace and Upgrading DSpace guides when performing a migration (e.g. you must
ensure you have correct dependencies installed and you must ensure you perform all upgrade steps).
1. Install a fresh copy of DSpace & migrate the database/files into it - This is the approach documented on this page. It is the recommended
approach as it ensures zero data loss. However, it does involve more steps to complete the migration.
2. Install a fresh copy of DSpace & use the AIP Backup and Restore - This is an alternative approach where you can use the AIP export tools to
export AIPs from your old site, and then import them into the new site. While this also works, keep in mind that not all DSpace data can be
exported to AIPs, so you will lose some data during this migration (namely any submissions not yet completed or still in workflow approval will be
lost, see the AIP Backup and Restore documentation for more details on what data is currently missing from AIPs).
You can also use this time to get your basic configurations setup properly for both the backend (local.cfg) and the frontend (config.prod.yml).
Step 2: Prepare your data to copy from the old DSpace to the new one
There are three main areas of data you need to migrate in order to ensure no data loss.
1. First, you should STOP tomcat on the old server. These steps require the site to be down.
2. Update sequences (optional) - When migrating content, sometimes sites will find that database sequences will be outdated or incorrect. This can
result in "duplicate key" errors during the database migration to the latest version. To avoid this, before you export your data, run this older copy
of the "update-sequences" command on your database. This should ensure your database sequences are updated before you dump your data.
a. NOTE: It is important to run the "update-sequences" script which came with the OLDER version of DSpace (the version you are
migrating from)! If you've misplaced this older version of the script, you can download it from our codebase & run it via the "psql"
command above.
i. DSpace 6.x version of "update-sequences.sql": https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace/etc/postgres
/update-sequences.sql
ii. DSpace 5.x version of "update-sequences.sql": https://github.com/DSpace/DSpace/blob/dspace-5_x/dspace/etc/postgres
/update-sequences.sql
3. The database data - Make sure to export the database data from your old DSpace site using a tool like "pg_dump" (for PostgresSQL). If you use
"pg_dump", you'll end up with a large SQL file which contains all the data from your old database.
4. The "assetstore" folder - This folder is in your DSpace installation directory and it contains all the files stored in your DSpace. You will need all
the contents of this folder (including all subdirectories), so you could choose to zip it up or you could copy it over directly.
5. The Solr data (optional) - Both DSpace authority and statistics are stored in Solr. If you want to keep these, you will want to export them from the
old Solr and move them over. Use the "solr-export-statistics" tool provided with DSpace: see "Export SOLR Statistics" in the Solr Statistics
Maintenance guide. (Requires Solr to be running. Keep in mind, this may require you to start Tomcat back up if Solr is running in Tomcat.)
Step 3: Copy over the prepared data and import it into the new DSpace
Copy the data you've prepared in Step 2 over to the new server.
Now, you'll import this data into your new installation of DSpace (created in Step 1).
1.
79
1. First, you must STOP Tomcat on the new server.
2. The database data - Before you can import the data, you must delete the new, empty database.
a. Delete/Clean the new, empty database (created in step 1) as you will have empty tables created during the installation. The easiest way
to achieve this is to run the "./dspace database clean" command. Keep in mind it requires temporarily enabling it via "db.
cleanDisabled=false" in your local.cfg. (After the "clean" command succeeds, make sure to remove this configuration.)
i. Alternatively, PostgreSQL users could delete the entire database (using dropdb command, e.g. "dropdb -U [db_username]
[db_name]") and recreate it based on the "Database Setup" instructions in Installing DSpace.
b. Import the database dump you created in Step 2 (above), which will recreate this database with all your old data in it. For Postgres, you
can use the "psql" command.
# Example of using psql to import data from a SQL file into a database
psql -U [db_username] [db_name] < [output_file.sql]
(NOTICE the direction of the angle character... in this command you are telling Postgres to execute all the commands contained in your
"output_file.sql", which will cause it to recreate all the database data in your new database.)
3. The "assetstore" folder - Delete the empty assetstore folder on the new server. Copy the entire assestore folder (and all subdirectories) from the
old server to the new one. In the end, you should have a several subdirectory hierarchies (containing your files) under the [dspace]
/assetstore/ folder on the new server.
4. The Solr data (optional) - If you exported the statistics or authority data in Step 2, then you can import this data from the exported files using the
"solr-import-statistics" tool provided with DSpace, see "Import SOLR Statistics" in the Solr Statistics Maintenance guide. (Requires Solr to be
running)
1. Migrate/Upgrade the database to the latest version - Now that your old data is migrated, you MUST ensure it's using the latest database updates
based on the new DSpace you've installed. Review the database steps in Upgrading DSpace and follow the instructions there.
NOTE: You should check the logs (dspace.log) for errors. Additional steps may be documented in the Upgrading DSpace guide.
2. Start Tomcat. This will bring your new DSpace back up, with the migrated data in place. Check the backend logs (dspace.log and Tomcat log) to
ensure no errors occur on startup.
3. Reindex all content - This will ensure all search/browse functionality works in the DSpace site. Optionally, if you use OAI-PMH, you will want to
reindex content into there as well.
NOTE: Until this command completes (it may take a while for large sites), you will not be able to fully browse/search the content from the User
Interface. To check the progress of the reindex, check your dspace.log file.
At this time, you also may wish to review your configurations on your old DSpace site, and see if there are any configurations that you wish to copy over
into your new DSpace site. This step is optional, as you can also choose to start "fresh" with a new local.cfg file.
FINALLY, test the new site and verify that all the content, user accounts, etc. have moved over successfully. If you encounter any issues, see our Troubles
hoot an error guide for hints/tips on finding the underlying error message & reporting it to Support lists/channels. Also make sure to check our list of Comm
on Installation Issues in the Installing DSpace guide.
80
Using DSpace
This page offers access to all aspects of the documentation relevant to using DSpace after it has been properly installed or upgraded. These pages
assume that DSpace is functioning properly. Please refer to the section on System Administration if you are looking for information on diagnosing DSpace
issues and measures you can take to restore your DSpace to a state in which it functions properly.
81
Authentication and Authorization
Authentication Plugins
Bulk Access Management
Embargo
Managing User Accounts
Request a Copy
82
Authentication Plugins
1 Stackable Authentication Method(s)
1.1 Authentication by Password
1.1.1 Enabling Authentication by Password
1.1.2 Configuring Authentication by Password
1.2 Open ID Connect (OIDC) Authentication
1.2.1 Enabling OIDC Authentication
1.2.2 Configuring OIDC Authentication
1.2.2.1 Sample/Test OIDC Configuration
1.3 Shibboleth Authentication
1.3.1 Enabling Shibboleth Authentication
1.3.2 Configuring Shibboleth Authentication
1.3.2.1 Apache "mod_shib" Configuration (required)
1.3.2.2 Sample shibboleth2.xml Configuration
1.3.2.3 Sample attribute-map.xml Configuration (for samltest.id)
1.3.2.4 DSpace Shibboleth Configuration Options
1.4 LDAP Authentication
1.4.1 Introduction to LDAP specific terminology
1.4.2 Enabling LDAP Authentication
1.4.3 Configuring LDAP Authentication
1.4.4 Debugging LDAP connection and configuration
1.4.5 Enabling Hierarchical LDAP Authentication
1.4.6 Configuring Hierarchical LDAP Authentication
1.5 ORCID Authentication
1.5.1 Enabling ORCID Authentication
1.6 IP Authentication
1.6.1 Enabling IP Authentication
1.6.2 Configuring IP Authentication
1.7 X.509 Certificate Authentication
1.7.1 Enabling X.509 Certificate Authentication
1.7.2 Configuring X.509 Certificate Authentication
1.8 Example of a Custom Authentication Method
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.
PasswordAuthentication
An authentication method is a class that implements the interface org.dspace.authenticate.AuthenticationMethod. It authenticates a user
by evaluating the credentials (e.g. username and password) he or she presents and checking that they are valid.
Authentication by Password
83
However, to enable Authentication by Password, you must ensure the org.dspace.authenticate.PasswordAuthentication class is listed as one
of the AuthenticationMethods in the following configuration:
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.
PasswordAuthentication
Use of inbuilt e-mail address/password-based log-in. This is achieved by sending login information to the "/api/authn/login" endpoint of the REST
API, in order to obtain a JSON Web Token. This JSON Web token must be sent on every later request which requires authentication.
Users can register themselves (i.e. add themselves as e-people without needing approval from the administrators), and can set their own
passwords when they do this
Users are not members of any special (dynamic) e-person groups
You can restrict the domains from which new users are able to register. To enable this feature, uncomment the following line from dspace.cfg: aut
hentication.password.domain.valid = example.com Example options might be '@example.com' to restrict registration to users with
addresses ending in @example.com, or '@example.com, .ac.uk' to restrict registration to users with addresses ending in @example.com or
with addresses in the .ac.uk domain.
Property: user.registration
Informational Note: This option allows you to disable all self-registration. When set to "false", no one will be able to register new accounts with your
system. Default is "true".
Property: authentication-password.domain.valid
Informational Note: This option allows you to limit self-registration to email addresses ending in a particular domain value. The above example
would limit self-registration to individuals with "@mit.edu" email addresses and all ".ac.uk" email addresses. (This setting only
works when user.registration=true)
Property: authentication-password.login.specialgroup
Informational Note: This option allows you to automatically add all password authenticated user sessions to a specific DSpace Group (the group
must exist in DSpace) for the remainder of their logged in session.
Property: authentication-password.digestAlgorithm
Informational Note: This option specifies the hashing algorithm to be used in converting plain-text passwords to more secure password digests.
The example value is the default. You may select any digest algorithm available through java.security.MessageDigest on your
system. At least MD2, MD5, SHA-1, SHA-256, SHA-384, and SHA-512 should be available, but you may have installed others.
Most sites will not need to adjust this.
Property: authentication-password.regex-validation.pattern
84
Informational Note: This option specifies a regular expression which all new passwords MUST validate against. By default, DSpace just requires a
new password to be 8 or more characters (see above example value). However, sites can modify this regex in order to require
more robust passwords of all users. One example of a complex rule is:
authentication-password.regex-validation.pattern = ^(?=.*?[a-z])(?=.*?[A-Z])(?=\\S*?[0-9])(?
=\\S*?[!?$@#$%^&+=]).{8\,15}$
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.
OidcAuthentication
(WARNING: it's easy to miss, the "camel case" for OidcAuthentication might catch you off guard. It's important to not use OIDC
Authentication in this line, because that class does not exist. Case matters.
Configuration [dspace]/config/modules/authentication-oidc.cfg
File:
Property: authentication-oidc.auth-server-url
Informational (Optional) The root URL of the OpenID Connect server. This is optional, as it's only used to fill out each of the "-endpoint" configs
Note: below (see below).
So, for some setups, it may be easier to configure the "-endpoint" configs directly INSTEAD OF the "auth-server-url" and "auth-
server-realm"
Property: authentication-oidc.auth-server-realm
Informational (Optional) The realm to authenticate against on the OpenID Connect server. This is optional, as it's only used to fill out each of the "-
Note: endpoint" configs below (see below).
So, for some setups, it may be easier to configure the "-endpoint" configs directly INSTEAD OF the "auth-server-url" and "auth-
server-realm"
Property: authentication-oidc.token-endpoint
Informational (Required) The URL of the OIDC Token endpoint. This defaults to using the configured "auth-server-url" and "auth-server-realm" to
Note: determine the likely OIDC path for this endpoint (see example above for the default value). However, if that default path is incorrect,
you may choose to hardcode the correct URL in this field.
85
Property: authentication-oidc.authorize-endpoint
Informational (Required) The URL of the OIDC Authorize endpoint. This defaults to using the configured "auth-server-url" and "auth-server-realm"
Note: to determine the likely OIDC path for this endpoint (see example above for the default value). However, if that default path is
incorrect, you may choose to hardcode the correct URL in this field.
Property: authentication-oidc.user-info-endpoint
Informational (Required) The URL of the OIDC Userinfo endpoint. This defaults to using the configured "auth-server-url" and "auth-server-realm"
Note: to determine the likely OIDC path for this endpoint (see example above for the default value). However, if that default path is
incorrect, you may choose to hardcode the correct URL in this field.
Property: authentication-oidc.client-id
Informational (Required) The registered OIDC client id for our DSpace server's use. No default value.
Note:
Property: authentication-oidc.client-secret
Informational (Required) The registered OIDC client secret for our DSpace server's use. No default value.
Note:
Property: authentication-oidc.redirect-url
Informational The URL users will be redirected to after a successful login. The example above is the default value, and it usually does not need to
Note: be updated.
Property: authentication-oidc.scopes
Informational The scopes to request from the OIDC server. The example above is the default value
Note:
Property: authentication-oidc.can-self-register
Informational Specify if the user can self register using OIDC (true|false). If not specified, true is assumed.
Note:
If this is set to false, then only users with an existing EPerson in DSpace will be able to authenticate through OIDC. When set to
true, an EPerson will be automatically created for each person who successfully authenticates through OIDC.
Property: authentication-oidc.user-info.email
Informational Specify the attribute present in the user info json related to the user's email. The default value is "email"
Note:
Property: authentication-oidc.user-info.first-name
Informational Specify the attribute present in the user info json related to the user's first/given name. The default value is "given_name"
Note:
Property: authentication-oidc.user-info.last-name
Informational Specify the attribute present in the user info json related to the user's last/family name. The default value is "family_name"
Note:
86
One way to easily test OIDC Authentication is to use the PhantAuth test site at https://www.phantauth.net/. This site allows you to create a random OIDC
client & a random OIDC user to authenticate as. So, it can be used to verify that DSpace's OIDC authentication is working in your system, but obviously is
only meant for development/testing purposes.
To configure DSpace to use PhantAuth for authentication just requires the following updates to your local.cfg:
# Enable OIDC
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.OidcAuthentication
# Because PhantAuth uses random users, you MUST ensure self registration is enabled
# (This is the default setting though, which is why it's commented out)
# authentication-oidc.can-self-register = true
Shibboleth Authentication
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.
ShibAuthentication
Before DSpace will work with Shibboleth, you must have the following:
1. An Apache web server with the "mod_shib" module installed. As mentioned, this mod_shib module acts as a proxy for all HTTP requests for your
servlet container (typically Tomcat). Any requests to DSpace that require authentication via Shibboleth should be redirected to 'shibd' (the
shibboleth daemon) by this "mod_shib" module. Details on installing/configuring mod_shib in Apache are available at: https://wiki.shibboleth.net
/confluence/display/SHIB2/NativeSPApacheConfig We also have a sample Apache + mod_shib configuration provided below.
2. An external Shibboleth IdP (Identity Provider). Using mod_shib, DSpace will only act as a Shibboleth SP (Service Provider). The actual
Shibboleth Authentication & Identity information must be provided by an external IdP. If you are using Shibboleth at your institution already, then
there already should be a Shibboleth IdP available. More information about Shibboleth IdPs versus SPs is available at: https://wiki.shibboleth.net
/confluence/display/SHIB2/UnderstandingShibboleth
For more information on installing and configuring a Shibboleth Service Provider see: https://wiki.shibboleth.net/confluence/display/SHIB2/Installation
87
When configuring your Shibboleth Service Provider there are two Shibboleth paradigms you may use: Active or Lazy Sessions. Active sessions is where
the mod_shib module is configured to product an entire URL space. No one will be able to access that URL without first authenticating with Shibboleth.
Using this method you will need to configure shibboleth to protect the URL: "/shibboleth-login". The alternative, Lazy Session does not protect any specific
URL. Instead Apache will allow access to any URL, and when the application wants to it may initiate an authenticated session.
The Lazy Session method is preferable for most DSpace installations, as you usually want to provide public access to (most) DSpace content, while
restricting access to only particular areas (e.g. administration UI/tools, private Items, etc.). When Active Sessions are enabled your entire DSpace site will
be access restricted. In other words, when using Active Sessions, Shibboleth will require everyone to first authenticate before they can access any part of
your repository (which essentially results in a "dark archive", as anonymous access will not be allowed).
In Debian based environments, "mod_shib" tends to be in a package named something like "libapache2-mod-shib2"
The Shibboleth setting "ShibUseHeaders" is no longer required to be set to "On", as DSpace will correctly utilize attributes instead of headers.
When "ShibUseHeaders" is set to "Off" (which is recommended in the mod_shib documentation), proper configuration of Apache to pass
attributes to Tomcat (via either mod_jk or mod_proxy) can be a bit tricky, SWITCH has some great documentation on exactly what you
need to do. We will eventually paraphrase/summarize this documentation here, but for now, the SWITCH page will have to do.
When initially setting up Apache & mod_shib, https://samltest.id/ provides a great testing ground for your configurations. This site provides a
sample/demo Shibboleth IdP (as well as a sample Shibboleth SP) which you can test against. It acts as a "sandbox" to get your configurations
working properly, before you point DSpace at your production Shibboleth IdP.
You also may wish to review the Shibboleth setup in our "dspace-shibboleth" Docker setup which the development team uses for testing (and it
uses https://samltest.id as the IdP). It may provide you with good examples/hints on getting everything setup. However, keep in mind this code
has not been tested in Production scenarios.
Below, we have provided a sample Apache configuration. However, as every institution has their own specific Apache setup/configuration, it is highly likely
that you will need to tweak this configuration in order to get it working properly. Again, see the official mod_shib documentation for much more detail about
each of these settings: https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPApacheConfig These configurations are meant to be added to an
Apache <VirtualHost> which acts as a proxy to your Tomcat (or other servlet container) running DSpace. More information on Apache VirtualHost settings
can be found at: https://httpd.apache.org/docs/2.2/vhosts/
#### SAMPLE MOD_SHIB CONFIGURATION FOR APACHE2 (it may require local modifications based on your Apache setup)
####
# While this sample VirtualHost is for HTTPS requests (recommended for Shibboleth, obviously),
# you may also need/want to create one for HTTP (*:80)
<VirtualHost *:443>
...
# PLEASE NOTE: We have omitted many Apache settings (ServerName, LogLevel, SSLCertificateFile, etc)
# which you may need/want to add to your VirtualHost
# Most DSpace instances will want to use Shibboleth "Lazy Session", which ensures that users
# can access DSpace without first authenticating via Shibboleth.
# This section turns on Shibboleth "Lazy Session". Also ensures that once they have authenticated
# (by accessing /Shibboleth.sso/Login path), then their Shib session is kept alive
<Location />
AuthType shibboleth
ShibRequireSession Off
require shibboleth
# If your "shibboleth2.xml" file specifies an <ApplicationOverride> setting for your
# DSpace Service Provider, then you may need to tell Apache which "id" to redirect Shib requests to.
# Just uncomment this and change the value "my-dspace-id" to the associated @id attribute value.
#ShibRequestSetting applicationId my-dspace-id
</Location>
# If a user attempts to access the DSpace shibboleth endpoint, force them to authenticate via Shib.
<Location "/server/api/authn/shibboleth">
Order deny,allow
Allow from all
AuthType shibboleth
ShibRequireSession On
88
# Please note that setting ShibUseHeaders to "On" is a potential security risk.
# You may wish to set it to "Off". See the mod_shib docs for details about this setting:
# https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPApacheConfig#NativeSPApacheConfig-
AuthConfigOptions
# Here's a good guide to configuring Apache + Tomcat when this setting is "Off":
# https://www.switch.ch/de/aai/support/serviceproviders/sp-access-rules.html#javaapplications
ShibUseHeaders On
Require shibboleth
</Location>
# If a user attempts to access the DSpace login endpoint, ensure Shibboleth is supported but other auth
methods can be too.
<Location "/server/api/authn/login">
Order deny,allow
Allow from all
AuthType shibboleth
# For DSpace, this is required to be off otherwise the available auth methods will be not visible
ShibRequireSession Off
# Please note that setting ShibUseHeaders to "On" is a potential security risk.
# You may wish to set it to "Off". See the mod_shib docs for details about this setting:
# https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPApacheConfig#NativeSPApacheConfig-
AuthConfigOptions
# Here's a good guide to configuring Apache + Tomcat when this setting is "Off":
# https://www.switch.ch/de/aai/support/serviceproviders/sp-access-rules.html#javaapplications
ShibUseHeaders On
</Location>
# Finally, you may need to ensure requests to /Shibboleth.sso are NOT redirected
# to Tomcat (as they need to be handled by mod_shib instead).
# NOTE: THIS SETTING IS LIKELY ONLY NEEDED IF YOU ARE USING mod_proxy TO REDIRECT
# ALL REQUESTS TO TOMCAT (e.g. ProxyPass /server ajp://localhost:8009/server)
ProxyPass /Shibboleth.sso !
</IfModule>
...
# You will likely need Proxy settings to ensure Apache is proxying requests to Tomcat for the DSpace REST API
# The below is just an example of proxying for REST API only. It requires installing & enabling "mod_proxy"
and "mod_proxy_ajp"
## Proxy / Forwarding Settings ##
<Proxy *>
AddDefaultCharset Off
Order allow,deny
Allow from all
</Proxy>
# Optionally, also proxy Angular UI (if on same server). This requires "mod_proxy_http"
#ProxyPass / http://localhost:4000/
#ProxyPassReverse / http://localhost:4000/
</VirtualHost>
89
In addition, here's a sample "ApplicationOverride" configuration for "shibboleth2.xml". This particular "ApplicationOverride" is configured to use the Test IdP
provided by https://samltest.id/ and is just meant as an example. In order to enable it for testing purposes, you must specify ShibRequestSetting
applicationId samltest in your Apach mod_shib configuration (see above). An additional, more detailed example is provided in our "dspace-
shibboleth" Docker configurations at https://github.com/DSpace/DSpace/blob/main/dspace/src/main/docker/dspace-shibboleth/shibboleth2.xml
<!-- We'll use a TEST IdP, hosted by the awesome https://samltest.id/ testing service. -->
<!-- See also: https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPServiceSSO -->
<!-- DSPACE 7 requires Shibboleth to use "SameSite=None" property for its Cookies -->
<Sessions lifetime="28800" timeout="3600" checkAddress="false" relayState="ss:mem" handlerSSL="
true" cookieProps="; path=/; SameSite=None; secure; HttpOnly">
<SSO entityID="https://samltest.id/saml/idp">
SAML2 SAML1
</SSO>
</Sessions>
<!-- Loads and trusts a metadata file that describes the IdP and how to communicate with it. -->
<!-- By default, metadata is retrieved from the TEST IdP at https://samltest.id/ -->
<!-- and is cached in a local file named "samltest-metadata.xml". -->
<!-- See also: https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPMetadataProvider -->
<MetadataProvider type="XML" uri="https://samltest.id/saml/idp"
backingFilePath="samltest-metadata.xml" reloadInterval="180000"/>
</ApplicationOverride>
90
<Attributes xmlns="urn:mace:shibboleth:2.0:attribute-map" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<!-- Custom Attributes specific to samltest.id -->
<Attribute name="urn:oid:0.9.2342.19200300.100.1.1" id="uid"/>
<Attribute name="urn:oid:0.9.2342.19200300.100.1.3" id="mail"/>
<Attribute name="urn:oid:2.5.4.4" id="sn"/>
<Attribute name="urn:oid:2.16.840.1.113730.3.1.241" id="displayName"/>
<Attribute name="urn:oid:2.5.4.20" id="telephoneNumber"/>
<Attribute name="urn:oid:2.5.4.42" id="givenName"/>
<Attribute name="https://samltest.id/attributes/role" id="role"/>
...
<!-- In addition to the attribute mapping, DSpace expects the following Shibboleth Headers to be set:
- SHIB-NETID
- SHIB-MAIL
- SHIB-GIVENNAME
- SHIB-SURNAME
These are set by mapping the respective IdP attribute (left hand side) to the header attribute (right
hand side).
-->
<Attribute name="urn:oid:0.9.2342.19200300.100.1.1" id="SHIB-NETID"/>
<Attribute name="urn:mace:dir:attribute-def:uid" id="SHIB-NETID"/>
</Attributes>
DSpace supports authentication using NetID, or email address. A user's NetID is a unique identifier from the IdP that identifies a particular user. The NetID
can be of almost any form such as a unique integer, string, or with Shibboleth 2.0 you can use "targeted ids". You will need to coordinate with your
shibboleth federation or identity provider. There are three ways to supply identity information to DSpace:
The NetID-based method is superior because users may change their email address with the identity provider. When this happens DSpace will not be able
to associate their new address with their old account.
In the case where a NetID header is not available or not found DSpace will fall back to identifying a user based-upon their email address.
In the event that neither Shibboleth headers are found then as a last resort DSpace will look at Tomcat's remote user field. This is the least attractive
option because Tomcat has no way to supply additional attributes about a user. Because of this the autoregister option is not supported if this method is
used.
If you are currently using Email based authentication (either 1 or 2) and want to upgrade to NetID based authentication then there is an easy path. Simply
enable shibboleth to pass the NetID attribute and set the netid-header below to the correct value. When a user attempts to log in to DSpace first DSpace
will look for an EPerson with the passed NetID, however when this fails DSpace will fall back to email based authentication. Then DSpace will update the
user's EPerson account record to set their NetID so all future authentications for this user will be based upon NetID. One thing to note is that DSpace will
prevent an account from switching NetIDs. If an account already has a NetID set and then they try and authenticate with a different NetID the
authentication will fail.
EPerson Metadata:
91
One of the primary benefits of using Shibboleth based authentication is receiving additional attributes about users such as their names, telephone
numbers, and possibly their academic department or graduation semester if desired. DSpace treats the first and last name attributes differently because
they (along with email address) are the three pieces of minimal information required to create a new user account. For both first and last name supply
direct mappings to the Shibboleth headers. In additional to the first and last name DSpace supports other metadata fields such as phone, or really anything
you want to store on an eperson object. Beyond the phone field, which is accessible in the user's profile screen, none of these additional metadata fields
will be used by DSpace out-of-the box. However if you develop any local modification you may access these attributes from the EPerson object. The Vireo
ETD workflow system utilizes this to aid students when submitting an ETD.
Role-based Groups:
DSpace is able to place users into pre-defined groups based upon values received from Shibboleth. Using this option you can place all faculty members
into a DSpace group when the correct affiliation's attribute is provided. When DSpace does this they are considered 'special groups', these are really
groups but the user's membership within these groups is not recorded in the database. Each time a user authenticates they are automatically placed within
the pre-defined DSpace group, so if the user loses their affiliation then the next time they login they will no longer be in the group.
Depending upon the shibboleth attributed use in the role-header it may be scoped. Scoped is shibboleth terminology for identifying where an attribute
originated from. For example a students affiliation may be encoded as "[email protected]". The part after the @ sign is the scope, and the preceding
value is the value. You may use the whole value or only the value or scope. Using this you could generate a role for students and one institution different
than students at another institution. Or if you turn on ignore-scope you could ignore the institution and place all students into one group.
The values extracted (a user may have multiple roles) will be used to look up which groups to place the user into. The groups are defined as "authentication
-shibboleth.role.<role-name>" which is a comma separated list of DSpace groups.
In addition to the below settings, you may need to ensure your Shibboleth IdP is trusted by the DSpace backend by adding it to your rest.cors.
allowed-origins configuration. This is required for Safari web browsers to work with DSpace's Shibboleth plugin.
For example, if your IdP is https://samltest.id/, then you need to append that URL to the comma-separated list of "allowed-origins" like:
More information on this configuration can be found in the REST API documentation.
Configuration [dspace]/config/modules/authentication-shibboleth.cfg
File:
Property: authentication-shibboleth.lazysession
Informational Whether to use lazy sessions or active sessions. For more DSpace instances, you will likely want to use lazy sessions. Active
Note: sessions will force every user to authenticate via Shibboleth before they can access your DSpace (essentially resulting in a "dark
archive").
Property: authentication-shibboleth.lazysession.loginurl
Informational The url to start a shibboleth session (only for lazy sessions). Generally this setting will be "/Shibboleth.sso/Login"
Note:
Property: authentication-shibboleth.lazysession.secure
Informational Force HTTPS when authenticating (only for lazy sessions). Generally this is recommended to be "true".
Note:
Property: authentication-shibboleth.netid-header
Informational The HTTP header where shibboleth will supply a user's NetID. This HTTP header should be specified as an Attribute within your
Note: Shibboleth "attribute-map.xml" configuration file.
Property: authentication-shibboleth.email-header
Informational The HTTP header where the shibboleth will supply a user's email address. This HTTP header should be specified as an Attribute
Note: within your Shibboleth "attribute-map.xml" configuration file.
Property: authentication-shibboleth.email-use-tomcat-remote-user
92
Example Value: authentication-shibboleth.email-use-tomcat-remote-user = false
Informational Used when a netid or email headers are not available should Shibboleth authentication fall back to using Tomcat's remote user
Note: feature? Generally this is not recommended. See the "Authentication Methods" section above.
Property: authentication-shibboleth.reconvert.attributes
Informational Shibboleth attributes are by default UTF-8 encoded. Some servlet container automatically converts the attributes from ISO-8859-1
Note: (latin-1) to UTF-8. As the attributes already were UTF-8 encoded it may be necessary to reconvert them. If you set this property
true, DSpace converts all shibboleth attributes retrieved from the servlet container from UTF-8 to ISO-8859-1 and uses the result as
if it were UTF-8. This procedure restores the shibboleth attributes if the servlet container wrongly converted them from ISO-8859-1
to UTF-8. Set this true, if you notice character encoding problems within shibboleth attributes.
Property: authentication-shibboleth.autoregister
Property: authentication-shibboleth.sword.compatibility
Informational SWORD compatibility will allow this authentication method to work when using SWORD. SWORD relies on username and password
Note: based authentication and is entirely incapable of supporting shibboleth. This option allows you to authenticate username and
passwords for SWORD sessions with out adding another authentication method onto the stack. You will need to ensure that a user
has a password. One way to do that is to create the user via the create-administrator command line command and then edit their
permissions.
WARNING: If you enable this option while ALSO having "PasswordAuthentication" enabled, then you should ensure that
"PasswordAuthentication" is listed prior to "ShibAuthentication" in your authentication.cfg file. Otherwise, ShibAuthentication will be
used to authenticate all of your users INSTEAD OF PasswordAuthentication.
Property: authentication-shibboleth.firstname-header
Informational The HTTP header where the shibboleth will supply a user's given name. This HTTP header should be specified as an Attribute
Note: within your Shibboleth "attribute-map.xml" configuration file.
Property: authentication-shibboleth.lastname-header
Informational The HTTP header where the shibboleth will supply a user's surname. This HTTP header should be specified as an Attribute within
Note: your Shibboleth "attribute-map.xml" configuration file.
Property: authentication-shibboleth.eperson.metadata
Example Value:
authentication-shibboleth.eperson.metadata = \
SHIB-telephone => phone, \
SHIB-cn => cn
Informational Additional user attributes mapping, multiple attributes may be stored for each user. The left side is the Shibboleth-based metadata
Note: Header and the right side is the eperson metadata field to map the attribute to.
Property: authentication-shibboleth.eperson.metadata.autocreate
Informational If the eperson metadata field is not found, should it be automatically created?
Note:
Property: authentication-shibboleth.role-header
Informational The Shibboleth header holding the user's Shibboleth roles. See the "Role-based Groups" section above for more info.
Note:
Property: authentication-shibboleth.role-header.ignore-scope
93
Example Value: authentication-shibboleth.role-header.ignore-scope = true
Informational Whether to ignore roles' scopes (everything after the @ sign for scoped attributes)
Note:
Property: authentication-shibboleth.role-header.ignore-value
Informational Whether to ignore roles' values (everything before the @ sign for scoped attributes)
Note:
Property: authentication-shibboleth.role.[affiliation-attribute]
Example Value:
authentication-shibboleth.role.faculty = Faculty, Member
authentication-shibboleth.role.staff = Staff, Member
authentication-shibboleth.role.student = Students, Member
Informational Mapping of affiliation values to DSpace groups. See the "Role-based Groups" section above for more info.
Note:
Property: authentication-shibboleth.default-roles
Informational These roles are assumed if no roles were sent by Shibboleth or there was no header with name matching the value of authentica
Note: tion-shibboleth.role_header. May be repeated to provide multiple default roles.
LDAP Authentication
https://stackoverflow.com/questions/18756688/what-are-cn-ou-dc-in-an-ldap-search
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.
LDAPAuthentication
If you want to give any special privileges to LDAP users, create a stackable authentication method to automatically put people who have a netid into a
special group. You might also want to give certain email addresses special privileges. Refer to the Custom Authentication Code section below for more
information about how to do this.
NOTE: As of DSpace 6, commas (,) are now a special character in the Configuration system. As some LDAP configuration may contain commas, you must
be careful to escape any required commas by adding a backslash (\) before each comma, e.g. "\,". The configuration reference for authentication-ldap.cfg
has been updated below with additional examples.
94
Configuration [dspace]/config/modules/authentication-ldap.cfg
File:
Property: authentication-ldap.enable
Informational This setting will enable or disable LDAP authentication in DSpace. With the setting off, users will be required to register and login
Note: with their email address. With this setting on, users will be able to login and register with their LDAP user ids and passwords.
Property: authentication-ldap.autoregister
Informational This will turn LDAP autoregistration on or off. With this on, a new EPerson object will be created for any user who successfully
Note: authenticates against the LDAP server when they first login. With this setting off, the user must first register to get an EPerson
object by entering their ldap username and password and filling out the forms.
Property: authentication-ldap.provider_url
Informational This is the url to your institution's LDAP server. You may or may not need the /o=myu.edu part at the end. Your server may also
Note: require the ldaps:// protocol. (This field has no default value)
NOTE: As of DSpace 6, commas (,) are now a special character in the Configuration system. Therefore, be careful to escape any
required commas in this configuration by adding a backslash (\) before each comma, e.g. "\,"
Property: authentication-ldap.starttls
Informational Should we issue StartTLS after establishing TCP connection in order to initiate an encrypted connection?
Note: Note: This (TLS) is different from LDAPS:
TLS is a tunnel for plain LDAP and is typically recognized on the same port (standard LDAP port: 389)
LDAPS is a separate protocol, deprecated in favor of the standard TLS method. (standard LDAPS port: 636)
Property: authentication-ldap.id_field
Explanation: This is the unique identifier field in the LDAP directory where the username is stored. (This field has no default value)
Property: authentication-ldap.object_context
Informational This is the LDAP object context to use when authenticating the user. By default, DSpace will use this value to create the user's DN
Note: in order to attempt to authenticate them. It is appended to the id_field and username. For example uid=username\,ou=people\,
o=myu.edu. You will need to modify this to match your LDAP configuration. (This field has no default value)
If your users do NOT all exist under a single "object_context" in LDAP, then you should ignore this setting and INSTEAD use the Hie
rarchical LDAP Authentication settings below (especially see "search.user" or "search.anonymous")
NOTE: As of DSpace 6, commas (,) are now a special character in the Configuration system. Therefore, be careful to escape any
required commas in this configuration by adding a backslash (\) before each comma, e.g. "\,"
Property: authentication-ldap.search_context
Informational This is the search context used when looking up a user's LDAP object to retrieve their data for autoregistering. With autoregister
Note: =true, when a user authenticates without an EPerson object we search the LDAP directory to get their name (id_field) and
email address (email_field) so that we can create one for them. So after we have authenticated against uid=username,
ou=people,o=byu.edu we now search in ou=people for filtering on [uid=username]. Often the search_context is the same as the
object_context parameter. But again this depends on your LDAP server configuration. (This field has no default value, and it
MUST be specified when either search.anonymous=true or search.user is specified)
NOTE: As of DSpace 6, commas (,) are now a special character in the Configuration system. Therefore, be careful to escape any
required commas in this configuration by adding a backslash (\) before each comma, e.g. "\,"
Property: authentication-ldap.email_field
95
Example Value: authentication-ldap.email_field = mail
Informational This is the LDAP object field where the user's email address is stored. "mail" is the most common for LDAP servers. (This field has
Note: no default value)
If the "email_field" is unspecified, or the user has no email address in LDAP, his/her username (id_field value) will be saved as the
email in DSpace (or appended to netid_email_domain, when specified)
Property: authentication-ldap.netid_email_domain
Informational If your LDAP server does not hold an email address for a user (i.e. no email_field), you can use the following field to specify
Note: your email domain. This value is appended to the netid (id_field) in order to make an email address (which is then stored in the
DSpace EPerson). For example, a netid of 'user' and netid_email_domain as @example.com would set the email of the user to
be [email protected]
Please note: this field will only be used if "email_field" is unspecified OR the user in question has no email address stored in
LDAP. If both "email_field" and "netid_email_domain" are unspecified, then the "id_field" will be used as the email
address.
Property: authentication-ldap.surname_field
Informational This is the LDAP object field where the user's last name is stored. "sn" is the most common for LDAP servers. If the field is not
Note: found the field will be left blank in the new eperson object. (This field has no default value)
Property: authentication-ldap.givenname_field
Informational This is the LDAP object field where the user's given names are stored. I'm not sure how common the givenName field is in different
Note: LDAP instances. If the field is not found the field will be left blank in the new eperson object. (This field has no default value)
Property: authentication-ldap.phone_field
Informational This is the field where the user's phone number is stored in the LDAP directory. If the field is not found the field will be left blank in
Note: the new eperson object. (This field has no default value)
Property: authentication-ldap.login.specialgroup
Informational If specified, all user sessions successfully logged in via LDAP will automatically become members of this DSpace Group (for the
Note: remainder of their current, logged in session). This DSpace Group must already exist (it will not be automatically created).
This is useful if you want a DSpace Group made up of all internal authenticated users. This DSpace Group can then be used to
bestow special permissions on any users who have authenticated via LDAP (e.g. you could allow anyone authenticated via LDAP to
view special, on campus only collections or similar)
Property: login.groupmap.*
Informational The left part of the value (before the ":") must correspond to a portion of a user's DN (unless "login.group.attribute" is
Note: specified..please see below). The right part of the value corresponds to the name of an existing DSpace group.
cn=jdoe,OU=Students,OU=Users,dc=example,dc=edu
that user would get assigned to the ALL_STUDENTS DSpace group for the remainder of their current session.
However, if that same user later graduates and is employed by the university, their DN in LDAP may change to:
cn=jdoe,OU=Employees,OU=Users,dc=example,dc=edu
Upon logging into DSpace after that DN change, the authenticated user would now be assigned to the ALL_EMPLOYEES DSpace
group for the remainder of their current session.
Note: This option can be used independently from the login.specialgroup option, which will put all LDAP users into a single DSpace
group. Both options may be used together.
96
Property: authentication-ldap.login.groupmap.attribute
Informational The value of the "authentication-ldap.login.groupmap.attribute" should specify the name of a single LDAP attribute. If
Note: this property is uncommented, it changes the meaning of the left part of "authentication-ldap.login.groupmap.*" (see
above) as follows:
If the authenticated user has this LDAP attribute, look up the value of this LDAP attribute in the left part (before the ":") of the au
thentication-ldap.login.groupmap.* value
If that LDAP value is found in any "authentication-ldap.login.groupmap.*" field, assign this authenticated user to the
DSpace Group specified by the right part (after the ":") of the authentication-ldap.login.groupmap.* value.
For example:
authentication-ldap.login.groupmap.attribute = group
authentication-ldap.login.groupmap.1 = mathematics:Mathematics_Group
The above would ensure that any authenticated users where their LDAP "group" attribute equals "mathematics" would be added to
the DSpace Group named "Mathematics_Group" for the remainder of their current session. However, if that same user logged in
later with a new LDAP "group" value of "computer science", he/she would no longer be a member of the "Mathematics_Group" in
DSpace.
One example of such an LDAP tool is the ldapsearch commandline tool available in most Linux operating systems (e.g. in Debian / Ubuntu it's available
in the "ldap-utils" package). Below are some example ldapsearch commands that can be used to determine (and/or debug) specific configurations in your
authentication-ldap.cfg. In the below examples, we've used the names of specific DSpace configurations as placeholders (in square brackets).
# Attempt to list the first 100 users in a given [search_context], returning the "cn", "mail" and "sn" fields
for each
ldapsearch -x -H [provider_url] -D [search.user] -W -b [search_context] -z 100 cn mail sn
# Attempt to find the first 100 users whose [id_field] starts with the letter "t", returning the [id_field],
"cn", "mail" and "sn" fields for each
ldapsearch -x -H [provider_url] -D [search.user] -W -b [search_context] -z 100 -s sub "([id_field]=t*)"
[id_field] cn mail sn
SSL Connection Errors: If you are using ldapsearch with an LDAPS connection (secure connection), you may receive "peer cert untrusted or revoked"
errors if the LDAP SSL certificate is self-signed. You can temporarily tell LDAP to accept any security certificate by setting TLS_REQCERT allow in your
ldapsearch's ldap.conf file. Be sure to remove this setting however after you are done testing!
# FOR TESTING ONLY! This setting disables the check for a valid LDAP Server security certificate,
# which is considered a security issue for production LDAP setups. Setting this to "allow" tells
# the LDAP client to accept any security certificates that it cannot verify or validate.
TLS_REQCERT allow
97
http://www.bind9.net/manual/openldap/2.3/tls.html
http://muzso.hu/2012/03/29/how-to-configure-ssl-aka.-ldaps-for-libnss-ldap-auth-client-config-in-ubuntu
If your users are spread out across a hierarchical tree on your LDAP server, you may wish to have DSpace search for the user name in your tree. Here's
how it works:
You can optionally specify the search scope. If anonymous access is not enabled on your LDAP server, you will need to specify the full DN and password
of a user that is allowed to bind in order to search for the users.
Configuration [dspace]/config/modules/authentication-ldap.cfg
File:
Property: authentication-ldap.search_scope
Informational This is the search scope value for the LDAP search during autoregistering (autoregister=true). This will depend on your LDAP
Note: server setup, and is only really necessary if your users are spread out across a hierarchical tree on your LDAP server. This value
must be one of the following integers corresponding to the following values:
object scope : 0
one level scope : 1
subtree scope : 2
Please note that "search_context" in the LDAP configurations must also be specified.
Property: authentication-ldap.search.anonymous
Informational If true, DSpace will anonymously search LDAP (in the "search_context") for the DN of the user trying to login to DSpace. This
Note: setting is "false" by default. By default, DSpace will either use "search.user" to authenticate for the LDAP search (if search.user is
specified), or will use the "object_context" value to create the user's DN.
Property: authentication-ldap.search.user
authentication-ldap.search.password
Informational The full DN and password of a user allowed to connect to the LDAP server and search (in the "search_context") for the DN of
Note: the user trying to login. By default, if unspecified, DSpace will either search LDAP anonymously for the user's DN (when search.
anonymous=true), or will use the "object_context" value to create the user's DN.
NOTE: As of DSpace 6, commas (,) are now a special character in the Configuration system. Therefore, be careful to escape any
required commas in this configuration by adding a backslash (\) before each comma, e.g. "\,"
ORCID Authentication
To enable ORCID Authentication, see the documentation for enabling the ORCID Integration. You do not need to enable ORCID synchronization, but you
currently must enable Researcher Profiles and Configurable Entities.
IP Authentication
Enabling IP Authentication
To enable IP Authentication, you must ensure the org.dspace.authenticate.IPAuthentication class is listed as one of the
AuthenticationMethods in the following configuration:
98
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.
IPAuthentication
Configuring IP Authentication
Once enabled, you are then able to map DSpace groups to IP addresses in authentication-ip.cfg by setting ip.GROUPNAME = iprange[,
iprange ...], e.g:
Negative matches can be set by prepending the entry with a '-'. For example if you want to include all of a class B network except for users of a contained
class c network, you could use: 111.222,-111.222.333.
Notes:
If the Groupname contains blanks you must escape the spaces, e.g. "Department\ of\ Statistics"
If your DSpace installation is hidden behind a web proxy, remember to set the useProxies configuration option within the 'Logging' section of ds
pace.cfg to use the IP address of the user rather than the IP address of the proxy server.
1. See the HTTPS installation instructions to configure your Web server. If you are using HTTPS with Tomcat, note that the <Connector> tag must
include the attribute clientAuth="true" so the server requests a personal Web certificate from the client.
2. Add the org.dspace.authenticate.X509Authentication plugin first to the list of stackable authentication methods in the value of the
configuration key plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Configuration [dspace]/config/modules/authentication.cfg
File:
Property: plugin.sequence.org.dspace.authenticate.AuthenticationMethod
Example Value:
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.
X509Authentication
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.
PasswordAuthentication
1. You must also configure DSpace with the same CA certificates as the web server, so it can accept and interpret the clients' certificates. It can
share the same keystore file as the web server, or a separate one, or a CA certificate in a file by itself. Configure it by oneof these methods, either
the Java keystore
99
authentication-x509.keystore.path = path to Java keystore file
authentication-x509.keystore.password = password to access the keystore
2. Choose whether to enable auto-registration: If you want users who authenticate successfully to be automatically registered as new E-Persons if
they are not already, set the autoregister configuration property to true. This lets you automatically accept all users with valid personal
certificates. The default is false.
By keeping this code in a separate method, we can customize the authentication process for MIT by simply adding it to the stack in the DSpace
configuration. None of the code has to be touched.
You can create your own custom authentication method and add it to the stack. Use the most similar existing method as a model, e.g. org.dspace.
authenticate.PasswordAuthentication for an "explicit" method (with credentials entered interactively) or org.dspace.authenticate.
X509Authentication for an implicit method.
100
Bulk Access Management
The Bulk Access Management will let administrators edit massively the access conditions of Metadata and Bitstreams on selected objects.
Usage:
When logged as Administrator, it is possible to change the access condition to Metadata and Bitstreams on items by following the path:
"Management > Access Control > Bulk Access Management".
Community or Collection Administrators may also use this tool from the "Access Control" tab of the "Edit Community", "Edit Collection" or "Edit
Item" page. In this scope, the tool will only perform changes within the selected Community/Collection/Item.
In the first box (Step 1) should be selected the objects on which access conditions will be changed.
In the second box (Step 2) the administrator will choose if change the access conditions on Metadata, or on Bitstreams or both.
101
If there is no previous access condition defined, a warning box will appear.
Openaccess
Administrator
Embargo
Lease
When done, click Execute. The process will start. If it succeeded, the process page and a success message will show.
102
At the moment, DSpace supports a single feature configuration defined by the defaultBulkAccessConditionConfiguration, in which are specified the access
conditions available for the Item’s and the Bitstream’s Metadata.
The access conditions listed in the dropdown menu are set by default as Openaccess, Administrator, Embargo, and Lease.
<bean id="defaultBulkAccessConditionConfiguration"
class="org.dspace.app.bulkaccesscontrol.model.BulkAccessConditionConfiguration">
<property name="name" value="default"/>
<property name="itemAccessConditionOptions">
<list>
<ref bean="openAccess"/>
<ref bean="administrator"/>
<ref bean="embargoed" />
<ref bean="lease"/>
</list>
</property>
<property name="bitstreamAccessConditionOptions">
<list>
<ref bean="openAccess"/>
<ref bean="administrator"/>
<ref bean="embargoed" />
<ref bean="lease"/>
</list>
</property>
</bean>
<bean id="bulkAccessConditionConfigurationService"
class="org.dspace.app.bulkaccesscontrol.service.BulkAccessConditionConfigurationService">
<property name="bulkAccessConditionConfigurations">
<list>
<ref bean="defaultBulkAccessConditionConfiguration"/>
</list>
</property>
</bean>
103
Embargo
1 What is an Embargo?
2 DSpace Embargo Functionality
2.1 Managing Embargoes on existing Items
3 Configuring and using Embargo in DSpace Submission User Interface
3.1 Enabling Item-level Embargo
3.2 Configuring Embargo / Access Restriction options
3.3 Private/Public (or Non-Discoverable/Discoverable) Item
3.4 Pre-3.0 Embargo Migration Routine
4 Technical Specifications
4.1 Introduction
4.2 ResourcePolicy
4.3 Item
4.4 Item.inheritCollectionDefaultPolicies(Collection c)
4.5 AuthorizeService
4.6 Withdraw Item
4.7 Reinstate Item
4.8 Pre-DSpace 3.0 Embargo Compatibility
5 Creating Embargoes via Metadata
5.1 Introduction
5.2 Setting Embargo terms via metadata
5.2.1 Terms assignment
5.2.2 Terms interpretation/imposition
5.2.3 Embargo period
5.3 Configuration of metadata fields
5.4 Operation
5.5 Extending embargo functionality
5.5.1 Setter
5.5.2 Lifter
What is an Embargo?
An embargo is a temporary access restriction placed on metadata or bitstreams (i.e. files). Its scope or duration may vary, but the fact that it eventually
expires is what distinguishes it from other content restrictions. For example, it is not unusual for content destined for DSpace to come with permanent
restrictions on use or access based on license-driven or other IP-based requirements that limit access to institutionally affiliated users. Restrictions such as
these are imposed and managed using standard administrative tools in DSpace, typically by attaching specific access policies (aka "resource policies") to
Items, Collections, Bitstreams, etc.
Embargo functionality was originally introduced as part of DSpace 1.6, enabling embargoes on the level of items that applied to all bitstreams included in
the item. Since DSpace 3.0, this functionality has been extended to the Submission User Interface, enabling embargoes on the level of individual
bitstreams.
When an embargo is set on either an item level or a bitstream level, a new ResourcePolicy (i.e. access policy) is added to the corresponding Item or
Bitstream. This ResourcePolicy will automatically control the lifting of the embargo (when the embargo date passes). An embargo lift date is
generally stored as the "start date" of such a policy. Essentially, this means that the access rights defined in the policy do not get applied until after that
date passes (and prior to that date, the access rights will default to Admin only).
The scheduled, manual "embargo-lifter" commands (used prior to DSpace 3) are no longer necessary and not recommended to run.
To add an embargo, edit the appropriate policy and set a "start date". To add an full Item embargo (including metadata), edit the Item policy. To
embargo individual bitstreams, edit the appropriate Bitstream policy.
To remove an embargo, edit the appropriate policy, and clear out the "start date".
To change an embargo, edit the appropriate policy, and change the "start date" to a new date.
Changes to the embargo should take effect immediately. However, as Administrators have full access to embargoed items, you may need to log out first.
After logging out, you will be subject to the embargo.
104
Available in DSpace 7.2 and above
In DSpace 7.2 and above, both Item-level embargoes and bitstream (file) level embargoes are supported in the Submission user interface.
In DSpace 7.1 and 7.0, the Submission user interface only supported embargoes on specified bitstreams (files). However, item-level embargoes could be
added after submissions were accepted using the "Manage Embargoes on existing Items" approach described above.
<submission-process name="traditional">
...
<!-- This step enables embargoes and other access restrictions at the Item level -->
<step id="itemAccessConditions"/>
</submission-process>
After making this update, you will need to restart your backend (REST API) for the changes to take effect.
For detailed information on configuring your Embargo options (and other related options like lease or restrict to a particular group of users), see the Submis
sion User Interface documentation. Specifically these two sections:
For Bitstream embargo / access options, see the section on "Configuring the File Upload step" of the Submission User Interface
For Item embargo / access options, see the section on "Configuring the Item Access Conditions step" of the Submission User Interface
Private (or non-Discoverable) items are not retrievable through the DSpace search, browse or Discovery indexes. However, they are accessible via a
direct link. It is possible to create a publicly accessible, non-discoverable item...in which case it can only be shared via a direct link. But, once anyone has
that link, it is available anonymously.
Therefore, an "Admin Search" option is provided, which allows you to search across all items, including private or withdrawn items. You can also filter your
results to display only private items.
In order to migrate old embargoes into ResourcePolicies, a migration routine has been developed. Please note that this migration routine should only
need to be run ONCE (immediately after an upgrade from 1.x.x to a more recent version of DSpace). After that point, any newly defined embargoes will
automatically be stored on ResourcePolicies.
[dspace]/bin/dspace migrate-embargo -a
Technical Specifications
Introduction
The following sections illustrate the technical changes that have been made to the back-end to add the new Advanced Embargo functionality.
ResourcePolicy
When an embargo is set at item level or bitstream level, a new ResourcePolicy will be added.
105
rpname: resource policy name
rptype: resource policy type
rpdescription: resource policy description
While rpname and rpdescription are fields manageable by users, the rptype is managed by DSpace itself. It represents a type that a resource policy can
assume, among the following:
TYPE_SUBMISSION: all the policies added automatically during the submission process
TYPE_WORKFLOW: all the policies added automatically during the workflow stage
TYPE_CUSTOM: all the custom policies added by users
TYPE_INHERITED: all the policies inherited from the enclosing object (for Item, a Collection; for Bitstream, an Item).
policy_id: 4847
resource_type_id: 2
resource_id: 89
action_id: 0
eperson_id:
epersongroup_id: 0
start_date: 2013-01-01
end_date:
rpname: Embargo Policy
rpdescription: Embargoed through 2012
rptype: TYPE_CUSTOM
Item
To manage Private/Public state a new boolean attribute has been added to the Item:
isDiscoverable
When an Item is private, the attribute will assume the value false.
Item.inheritCollectionDefaultPolicies(Collection c)
This method has been adjusted to leave custom policies, added by the users, in place and add the default collection policies only if there are no custom
policies.
AuthorizeService
Some methods have been changed on AuthorizeService to manage the new fields and some convenience methods have been introduced:
Withdraw Item
The feature to withdraw an item from the repository has been modified to keep all the custom policies in place.
Reinstate Item
The feature to reinstate an item in the repository has been modified to preserve existing custom policies.
Introduction
106
Prior to DSpace 3.0, all DSpace embargoes were stored as metadata. While embargoes are no longer stored permanently in metadata fields (they are
now stored on ResourcePolicies, i.e. access policies), embargoes can still be initialized via metadata fields.
This ability to create/initialize embargoes via metadata is extremely powerful if you wish to submit embargoed content via electronic means (such as Importi
ng Items via Simple Archive Format, SWORDv1, SWORDv2, etc).
These terms are interpreted by the embargo system to yield a specific date on which the embargo can be removed (or "lifted"), and a specific set of access
policies. Obviously, some terms are easier to interpret than others (the absolute date really requires none at all), and the default embargo logic
understands only the most basic terms (the first and third examples above). But as we will see below, the embargo system provides you with the ability to
add your own interpreters to cope with any terms expressions you wish to have. This date that is the result of the interpretation is stored with the item. The
embargo system detects when that date has passed, and removes the embargo ("lifts it"), so the item bitstreams become available. Here is a more
detailed life-cycle for an embargoed item:
Terms assignment
The first step in placing an embargo on an item is to attach (assign) "terms" to it. If these terms are missing, no embargo will be imposed. As we will see
below, terms are carried in a configurable DSpace metadata field, so assigning terms just means assigning a value to a metadata field. This can be done
in a web submission user interface form, in a SWORD deposit package, a batch import, etc. - anywhere metadata is passed to DSpace. The terms are not
immediately acted upon, and may be revised, corrected, removed, etc, up until the next stage of the life-cycle. Thus a submitter could enter one value, and
a collection editor replace it, and only the last value will be used. Since metadata fields are multivalued, theoretically there can be multiple terms values,
but in the default implementation only one is recognized.
Terms interpretation/imposition
In DSpace terminology, when an Item has exited the last of any workflow steps (or if none have been defined for it), it is said to be "installed" into the
repository. At this precise time, the interpretation of the terms occurs, and a computed "lift date" is assigned, and recorded as part of the ResourcePolicy
(aka policy) of the Item. Once the lift date has been assigned to the ResourcePolicy, the metadata field which defined the embargo is cleared. From that
point forward, all embargo information is controlled/defined by the ResourcePolicy.
It is important to understand that this interpretation happens only once, (just like the installation). Therefore, updating/changing an embargo cannot be
done via metadata fields. Instead, all embargo updates must be made to the ResourcePolicies themselves (e.g. ResourcePolicies can be managed from
the Admin UI in the Edit Item screens).
Also note that since these policy changes occur before installation, there is no time during which embargoed content is "exposed" (accessible by non-
administrators). The terms interpretation and imposition together are called "setting" the embargo, and the component that performs them both is called the
embargo "setter".
Embargo period
After an embargoed item has been installed, the policy restrictions remain in effect until the embargo date passes. Once the embargo date passes, the
policy restrictions are automatically lifted. An embargo lift date is generally stored as the "start date" of a policy. Essentially, this means that the policy
does not get applied until after that date passes (and prior to that date, the object defaults to Admin only access).
Administrators are able to change the lift date of the embargo by editing the policy (ResourcePolicy). These policies can be managed from the Edit Item
screens.
107
You replace the placeholder values with real metadata field names. If you only need the "default" embargo behavior - which essentially accepts only
absolute dates as "terms" - this is the only configuration required, except as noted below.
You are free to use existing metadata fields, or create new fields. If you choose the latter, you must understand that the embargo system does not create
or configure these fields: i.e. you must follow all the standard documented procedures for actually creating them (i.e. adding them to the metadata registry,
or to display templates, etc) - this does not happen automatically. Likewise, if you want the field for "terms" to appear in submission screens and workflows,
you must follow the documented procedure for configurable submission (basically, this means adding the field to submission-forms.xml). The flexibility of
metadata configuration makes if easy for you to restrict embargoes to specific collections, since configurable submission can be defined per collection.
Key recommendations:
1. Use a local metadata schema. Breaking compliance with the standard Dublin Core in the default metadata registry can create a problem for the
portability of data to/from of your repository.
2. If using existing metadata fields, avoid any that are automatically managed by DSpace. For example, fields like "date.issued" or "date.
accessioned" are normally automatically assigned, and thus must not be recruited for embargo use.
3. Do not place the field for "lift date" in submission screens. This can potentially confuse submitters because they may feel that they can directly
assign values to it. As noted in the life-cycle above, this is erroneous: the lift date gets assigned by the embargo system based on the terms. Any
pre-existing value will be over-written. But see next recommendation for an exception.
4. As the life-cycle discussion above makes clear, after the terms are applied, that field is no longer actionable in the embargo system. Conversely,
the "lift date" field is not actionable until the application. Thus you may want to consider configuring both the "terms" and "lift date" to use the
same metadata field. In this way, during workflow you would see only the terms, and after item installation, only the lift date. If you wish the
metadata to retain the terms for any reason, use 2 distinct fields instead.
Operation
After the fields defined for terms and lift date have been assigned in dspace.cfg, and created and configured wherever they will be used, you can begin to
embargo items simply by entering data (dates, if using the default setter) in the terms field. They will automatically be embargoed as they exit workflow,
and that the computed lift date will be stored on the ResourcePolicy
Setter
The default setter recognizes only two expressions of terms: either a literal, non-relative date in the fixed format "yyyy-mm-dd" (known as ISO 8601), or a
special string used for open-ended embargo (the default configured value for this is "forever", but this can be changed in dspace.cfg to "toujours",
"unendlich", etc). It will perform a minimal sanity check that the date is not in the past. Similarly, the default setter will only remove all read policies as noted
above, rather than applying more nuanced rules (e.g allow access to certain IP groups, deny the rest). Fortunately, the setter class itself is configurable
and you can "plug in" any behavior you like, provided it is written in java and conforms to the setter interface. The dspace.cfg property:
Lifter
DEPRECATED: The Lifter is no longer used in the DSpace API, and is not recommended to utilize. Embargo lift dates are now stored on ResourcePolicies
and, as such, are "lifted" automatically when the embargo date passes. Manually running a "lifter" may bypass this automatic functionality and result in
unexpected results.
The default lifter behavior as described above - essentially applying the collection policy rules to the item - might also not be sufficient for all purposes. It
also can be replaced with another class:
108
Pre-3.0 Embargo Lifter Commands
DEPRECATED - Not recommended to use
The old "embargo-lifter" command is no longer necessary to run. All Embargoes in DSpace are now stored on ResourcePolicies and are lifted
automatically after the lift date passed. See Embargo documentation for more information.
Continuing to run the "embargo-lifter" is not recommended and this feature will be removed entirely in a future DSpace release.
If you have implemented the pre DSpace 3.0 Embargo feature, you will need to run it periodically to check for Items with expired embargoes and lift them.
-c or --check ONLY check the state of embargoed Items, do NOT lift any embargoes
-i or --identifier Process ONLY this handle identifier(s), which must be an Item. Can be repeated.
-l or --lift Only lift embargoes, do NOT check the state of any embargoed items.
-v or --verbose Print a line describing the action taken for each embargoed item found.
You must run the Embargo Lifter task periodically to check for items with expired embargoes and lift them from being embargoed. For example, to check
the status, at the CLI:
[dspace]/bin/dspace embargo-lifter -c
To lift the actual embargoes on those items that meet the time criteria, at the CLI:
[dspace]/bin/dspace embargo-lifter -l
109
Managing User Accounts
From the browser
From the command line
The user command
To create a new user account:
To list accounts:
To modify an account:
To delete an account:
The Groomer
Find accounts with unsalted passwords
Find (and perhaps delete) disused accounts
Cryptographic properties
When a user registers an account for the purpose of subscribing to change notices, submitting content, or the like, DSpace creates an EPerson record in
the database. Administrators can manipulate these records in several ways.
Login as an Administrator
Sidemenu "Access Control" "Groups"
Edit the Group
Search for the EPerson & add/remove them from that group.
To debug issues for a specific user, it's possible to login as (or "impersonate") that user account
On the backend, first you MUST enable the "assumelogin" feature. This feature is disabled by default. Update this setting in your local.cfg or
dspace.cfg
One of the options --email or --netid is required to name the record. The complete options are:
-a --add required
-n --netid "netid" (a username in an external system such as a directory – see Authentication Methods for details)
110
-g --givenname First or given name
To list accounts:
-L --list required
To modify an account:
-M --modify required
To delete an account:
-d --delete required
The Groomer
This tool inspects all user accounts for several conditions.
111
short long meaning
The output is a tab-separated-value table of the EPerson ID, last login date, email address, netid, and full name for each matching account.
Cryptographic properties
The cryptographic properties used for generating the salted hashes, to ensure encryption at rest for user passwords, can be found and adjusted in:
https://github.com/DSpace/DSpace/blob/main/dspace-api/src/main/java/org/dspace/eperson/PasswordHash.java
112
Email Subscriptions
Introduction
Adding new subscriptions
Managing your subscriptions
Enable sending out emails
Introduction
This feature is available in 7.5 or later.
Registered users can subscribe to communities or collections in DSpace. After subscribing, users will receive a regular email containing the new and
modified items in the communities/collections they are subscribed to.
In the User interface, browse to the Community or Collection you wish to subscribe to, and click on the Subscribe button.
After clicking that button, you'll see a popup window which allows you to select the frequency of subscription you'd like.
Daily: Receive a daily email of Items under the Community/Collection which have been updated in the last day.
Weekly: Receive a weekly email of Items under the Community/Collection which have been updated in the last week.
Monthly: Receive a monthly email of Items under the Community/Collection which have been updated in the last month.
113
From this page, you are able to see all your current Community/Collection subscriptions. You can choose to edit or delete any in the list.
NOTE: Until you enable the "subscription-send" script, users will not receive the email updates for their subscriptions. It is HIGHLY RECOMMENDED to
enable this script via Scheduled Tasks via Cron. See sample settings on that page.
To send out the subscription emails you MUST invoke the subscription-send script from the DSpace command-line or Processes UI. It is advised to
setup this script as a Scheduled Tasks via Cron. See sample settings on that page.
This script requires the "-f" (--frequency) parameter with a value of "D" (Daily), "W" (Weekly), or "M" (Monthly). Keep in mind, you will want to schedule it to
run on a Daily, Weekly and Monthly basis to send the appropriate emails.
114
Request a Copy
Introduction
Requesting a copy using the User Interface
(Optional) Requesting a copy with Help Desk workflow
Email templates
Configuration parameters
Selecting Request a Copy strategy
Configure who gets request via a metadata field
Configure all requests to go to a helpdesk email
Configure all requests to go to the administrators of a Collection
Combine multiple strategies
Introduction
Supported in 7.1 or above
Request a Copy was not available in DSpace 7.0. It was restored in DSpace 7.1. See DSpace Release 7.0 Status
The request a copy functionality was added to DSpace as a measure to facilitate access in those cases when uploaded content can not be openly shared
with the entire world immediately after submission into DSpace. It gives users an efficient way to request access to the original submitter of the item, who
can approve this access with the click of a button. This practice complies with most applicable policies as the submitter interacts directly with the requester
on a case by case basis.
The request form asks the user for his or her name, email address and message where the reason for requesting access can be entered.
115
After clicking "Request copy" at the bottom of this form, the original submitter of the item will receive an email containing the details of the request. The
email also contains a link with a token that brings the original submitter to a page where he or she can either grant or reject access. If the original submitter
can not evaluate the request, he or she can forward this email to the right person, who can use the link containing the token without having to log into
DSpace.
Each of these buttons registers the choice of the submitter, displaying the following form in which an additional reason for granting or rejecting the access
can be added.
116
After hitting send, the contents of this form will be sent together with the associated files to the email address of the requester. In case the access is
rejected, only the reason will be sent to the requester.
While responding positively to a request for copy, the person who approved may also ask the repository administrator to alter the access rights of the item,
allowing unrestricted open access to everyone, by checking "Change to open access".
As of 7.6, the HelpDesk workflow can be performed without requiring authentication (issue #8636 has been fixed)
(Optional) Request Item with HelpDesk intermediary, is steered towards having your Repository Support staff act as a helpdesk that receives all incoming
RequestItem requests, and then processes them. This adds the options of "Initial Reply to Requestor" to let the requestor know that their request is being
worked on, and an option "Author Permission Request" which allows the helpdesk to email the author of the document, as not all documents are deposited
by the author, or the author will need to be tracked down by a support staff, as DSpace might not have their current email address.
117
Author permission request, includes information about the original request (requester name, requester email, requester's reason for requesting). The author
/submitter's name and email address will be pre-populated in the form from the submitter, but the email address and author name are editable, as the
submitter's of content to DSpace aren't always the author.
118
Email templates
Most of the email templates used by Request a Copy are treated just like other email templates in DSpace. The templates can be found in the /config
/emails directory and can be altered just by changing the contents and restarting tomcat.
request_item. template for the message that will be sent to the administrator of the repository, after the original submitter requests to have the
admin permissions changed for this item.
request_item. template for the message that will be sent to the original submitter of an item with the request for copy.
author
The templates for emails that the requester receives, that could have been customized by the approver in the aforementioned dialog are not managed as
separate email template files. These defaults are stored in the Messages.properties file under the keys
119
itemRequest.response.body.reject Default message for informing the requester of the rejection
Configuration parameters
Request a copy is enabled by default. These configuration parameters in dspace.cfg relate to Request a Copy:
Propert request.item.type
y:
Inform This parameter manages who can file a request for an item. The parameter is optional. When it is empty or commented out, request a copy is
ational disabled across the entire repository. When set to all, any user can file a request for a copy. When set to logged, only registered users can
Note file a request for copy.
Propert mail.helpdesk
y:
Inform The email address assigned to this parameter will receive the emails both for granting or rejecting request a copy requests, as well as
ational requests to change item policies.
Note
This parameter is optional. If it is empty or commented out, it will default to mail.admin.
WARNING: This setting is only utilized if the RequestItemHelpdeskStrategy bean is enabled in [dspace]/config/spring/api
/requestitem.xml (see below)
Propert request.item.helpdesk.override
y:
Inform Should all Request Copy emails go to the mail.helpdesk instead of the item submitter? Default is false, which sends Item Requests to
ational the item submitter.
Note
WARNING: This setting is only utilized if the RequestItemHelpdeskStrategy bean is enabled in [dspace]/config/spring/api
/requestitem.xml (see below)
New in DSpace 7
The strategy is selected by configuring it into the <bean/> for RequestItemMetadataStrategy as a constructor argument.
120
Syntax for 7.6 or later
<!-- This alias defines that you want to use the RequestItemMetadataStrategy (this is enabled by default) -->
<!-- This bean specifies that you want to use the RequestItemMetadataStrategy (this is enabled by default) -->
<bean class="org.dspace.app.requestitem.RequestItemEmailNotifier" lazy-init='false'>
<description>This sends various emails between the requestor and the grantor.</description>
<!-- This bean allows you to specify which metadata field is used (if any) -->
<bean class="org.dspace.app.requestitem.RequestItemMetadataStrategy"
id="org.dspace.app.requestitem.RequestItemMetadataStrategy">
<!--
Uncomment these properties if you want lookup in metadata the email and the name of the author to contact for
request copy.
If you don't configure that or if the requested item doesn't have these metadata the submitter data are used
as fail over
-->
</bean>
<!-- This alias defines that you want to use the RequestItemMetadataStrategy (this is enabled by default) -->
<alias alias='org.dspace.app.requestitem.RequestItemAuthorExtractor'
name='org.dspace.app.requestitem.RequestItemMetadataStrategy'/>
<!-- This bean allows you to specify which metadata field is used (if any) -->
<bean class="org.dspace.app.requestitem.RequestItemMetadataStrategy"
id="org.dspace.app.requestitem.RequestItemMetadataStrategy"
autowireCandidate="true">
<!--
Uncomment these properties if you want lookup in metadata the email and the name of the author to contact for
request copy.
If you don't configure that or if the requested item doesn't have these metadata the submitter data are used
as fail over
-->
</bean>
1. Create a metadata field which you'd like to use to store this email address (and optionally a second metadata field for the full name).
a. Hint: You may wish to add this metadata field to your "metadata.hide.*" settings in local.cfg in order to ensure this metadata field is
hidden from normal users & is only visible to Administrative users. That way this email address will NOT appear in Item display pages
(except to Administrators)
2. Uncomment the "emailMetadata" setting above, and configure it's "value" to use the new metadata field.
3. Edit the Item(s) which you wish to use this field. Add the new metadata field to those items, given it a value of the email address which will receive
the request for copy. By default, if an Item does NOT have this metadata field, the request for copy will still go to the Item's submitter.
Another common request strategy is the use a single Helpdesk email address to receive all of these requests (see corresponding helpdesk configs in
dspace.cfg above). If you wish to use the Helpdesk Strategy, you must replace the references to the default RequestItemMetadataStrategy, bean
with the RequestItemHelpdeskStrategy bean:
121
Syntax for 7.6 or later
<!-- To change the settings, you need to modify the constructor-arg (see below) to use the
"RequestItemEmailNotifier" bean.-->
<bean class="org.dspace.app.requestitem.RequestItemEmailNotifier" lazy-init='false'>
<description>This sends various emails between the requestor and the grantor.</description>
<!-- To change the settings, you need to modify the constructor-arg (see below) to use the
"CollectionAdministratorsRequestItemStrategy" bean.-->
<bean class="org.dspace.app.requestitem.RequestItemEmailNotifier" lazy-init='false'>
<description>This sends various emails between the requestor and the grantor.</description>
In the following example, email will be sent to the address(es) found in the configured metadata fields (or to the submitter if none), and to the owning
collection's administrators.
122
Syntax for 7.6 or later
<!-- To change the settings, you need to modify the constructor-arg (see below) to use the
"CombiningRequestItemStrategy" bean.-->
<bean class="org.dspace.app.requestitem.RequestItemEmailNotifier" lazy-init='false'>
<description>This sends various emails between the requestor and the grantor.</description>
<!-- This bean is where you can combine multiple strategies by referencing them in the <list> below -->
<bean class='org.dspace.app.requestitem.CombiningRequestItemStrategy'
id='org.dspace.app.requestitem.CombiningRequestItemStrategy'>
<constructor-arg>
<description>A list of references to RequestItemAuthorExtractor beans</description>
<list>
<ref bean='org.dspace.app.requestitem.RequestItemMetadataStrategy'/>
<ref bean='org.dspace.app.requestitem.CollectionAdministratorsRequestItemStrategy'/>
</list>
</constructor-arg>
</bean>
<!-- This bean is where you can combine multiple strategies by referencing them in the <list> below -->
<bean class='org.dspace.app.requestitem.CombiningRequestItemStrategy'
id='org.dspace.app.requestitem.CombiningRequestItemStrategy'
autowireCandidate='true'>
<constructor-arg>
<description>A list of references to RequestItemAuthorExtractor beans</description>
<list>
<ref bean='org.dspace.app.requestitem.RequestItemMetadataStrategy'/>
<ref bean='org.dspace.app.requestitem.CollectionAdministratorsRequestItemStrategy'/>
</list>
</constructor-arg>
</bean>
123
CAPTCHA Verification
This feature is available starting from DSpace 7.4
This feature, when enabled, offers a powerful additional layer of protection against possible unwanted behaviors like massive registrations performed by
bots using random or stolen email addresses. Feature can be enabled or disabled by decision of DSpace instance administrator, and is based on Google
reCAPTCHA.
ReCAPTCHA supported versions are v2 with both invisible (https://developers.google.com/recaptcha/docs/invisible) and checkbox (https://developers.
google.com/recaptcha/docs/display) verification modes, and v3 (https://developers.google.com/recaptcha/docs/v3)
Prerequisites
Before enabling the feature, a valid site and secret pair should be obtained from Google reCAPTCHA system, by registering the DSpace application on
which verification will be set on reCAPTCHA admin panel (https://www.google.com/recaptcha/admin)
registration.verification.enabled = true
Whereas, in case v2 of Google reCAPTCHA is to be enabled, these properties, in configuration files, must be set
google.recaptcha.version = v2
google.recaptcha.mode = <invisible or checkbox depending on which mode is wanted>
google.recaptcha.key.site = <your site here>
google.recaptcha.key.secret = <your secret here>
google.recaptcha.version = v3
google.recaptcha.key.site = <your site here>
google.recaptcha.key.secret = <your secret here>
google.recaptcha.site-verify = https://www.google.com/recaptcha/api/siteverify
google.recaptcha.key.threshold = <score threshold>
google.recaptcha.mode = invisible
google.recaptcha.key.threshold property is related to reCAPTCHA verification logic. v3 assigns to each request made against verification APIs,
in this case by DSpace system during registration process. reCAPTCHA v3 returns a score (1.0 is very likely a good interaction, 0.0 is very likely a bot).
By default a good threshold could be 0.5. For further information, see https://developers.google.com/recaptcha/docs/v3#interpreting_the_score
Once feature is enabled, the user registration will actually be performed if and only if the CAPTCHA token, passed in registration payload, is verified during
registration process itself and is considered valid. Each registration request, even if made using DSpace REST APIs must have a captcha token in its
header.
124
A new type of cookie has been added to DSpace cookie set, "Registration and Password Recovery". This cookie is proposed only when CAPTCHA
verification is enabled.
This cookie option must be enabled by users before registering, otherwise they won't be able to perform a registration
125
Configurable Entities
Introduction
Default Entity Models
Research Entities
Journals
Enabling Entities
1. Configure your entity model (optionally)
2. Import entity model into the database
3. Configure Collections for each Entity type
4. Configure Submission Forms for each Entity type
4.1 Use of collection-entity-type attribute for default Submission forms per Entity Type
5. Configure Workflow for each Entity type (optionally)
6. Configure Virtual Metadata to display for related Entities (optionally)
Designing your own Entity model
Thinking about the object model
Configuring the object model
Configuring the metadata fields
Configuring the item display pages
Configuring virtual metadata
Configuring discovery
Additional Technical Details
Tilted relationships
Use case: OrgUnit vs Publication relationship
Impact of the tilted relationships
Versioning Support
Example of the latest status of a relationship (technical details)
Metadata fields that represent relations
Configure versioning for an entity type
Introduction
DSpace users have expressed the need for DSpace to be able to provide more support for different types of digital objects related to open access
publications, such as authors/author profiles, data sets etc. Configurable Entities are designed to meet that need.
In DSpace, an Entity is a special type of Item which often has Relationships to other Entities. Breaking it down with more details...
Entities and their Relationships are also completely configurable. DSpace provides some sample models out of the box, which you can use directly or
adapt as needed.
The Entity model also has similarities with the Portland Common Data Model (PCDM), with an Entity roughly mapping to a "pcdm:Object" and existing
Communities and Collections roughly mapping to a "pcdm:Collection". However, at this time DSpace Entities concentrate more on building a graph
structure of relationships, instead of a tree structure.
Research Entities
Research Entities include Person, OrgUnit, Project and Publication. They allow you to create author profiles (Person) in DSpace, and relate those people
to their department(s) (OrgUnit), grant project(s) (Project) and works (Publication).
126
Each publication can link to projects, people and org units
Each person can link to projects, publications and org units
Each project can link to publications, people and org units
Each org units can link to projects, people and publications
Journals
Journal Entities include Journal, Journal Volume, Journal Issue and Publication (article). They allow you to represent a Journal hierarchy more easily
within DSpace, starting at the overall Journal, consisting of multiple Volumes, and each Volume containing multiple Issues. Issues then link to all articles
(Publication) which were a part of that journal issue.
NOTE: that this model includes the same "Publication" entity as the Research Entities model described above. This Entity overlap allows you to link an
article (Publication) both to its author (Person) as well as the Journal Issue it appeared in.
Enabling Entities
By default, Entities are not used in DSpace. But, as described above several models are available out-of-the-box that may be optionally enabled.
Keep in mind, there are a few DSpace import/export features that do not yet support Entities in DSpace 7.0. These will be coming in future 7.x
releases. See DSpace Release 7.0 Status for prioritization information, etc.
AIP Backup and Restore does not fully support entity types or relationships. In other words, Entities are only represented as normal Items in AIPs
Importing and Exporting Items via Simple Archive Format does not fully support entity types or relationships. In other words, Entities are only
represented as normal Items in SAF. (Note: early work to bring this support is already begun in https://github.com/DSpace/DSpace/pull/3322)
SWORDv1 Server and SWORDv2 Server does not yet support Entity or relationship creation.
You can also design your own model from scratch (see "Designing your own model" section below). So, feel free to start by modifying relationship-
types.xml, or creating your own model based on the relationship-types.dtd.
If an Entity (of same type name) already exists, it will be updated with any new relationships defined in relationship-types.xml
If an Entity (of same type name) doesn't exist, the new Entity type will be created along with its relationships defined in relationship-types.xml
All valid Entity Types are stored in the "entity_type" database table.
All Relationship type definitions are stored in the "relationship_type" database table
127
All Relationships between entities get stored in the "relationship" table.
Entities themselves are stored alongside Items in the 'item' table. Every Entity must have a "dspace.entity.type" metadata field whose value is a
valid Entity Type (from the "entity_type" table).
Keep in mind, your currently enabled Entity model is defined in your database, and NOT in the "relationship-types.xml". Anytime you want to update your
data model, you'd update/create a configuration (like relationship-types.xml) and re-run the "initialize-entities" command.
1. Create at least one Collection for each Entity Type needing a custom Submission form. For example, a Collection for "Person" entities, and a
separate one for "Publication" entities.
2. Edit the Collection. On the "Edit Metadata" page, use the "Entity Type" dropdown to select the Entity Type for this Collection.
a. This "Entity Type" selection will ensure that every Item submitted to this collection is automatically assigned that Entity type. So, it ties
this Collection to that type of Entity (i.e. no other type of Entity can be submitted to this Collection).
i. NOTE: Entity Type is currently not modifiable after being set. This is because changing the Entity type may result in odd
behavior (or errors) with in-progress submissions (as they will continue to use the old Entity Type). If you really need to modify
the Entity Type, you can do so by changing the "dspace.entity.type" metadata value on the Collection object. At this time,
changing that metadata field would need to be done at the database level.
b. NOTE: In 7.0, this "Entity Type" dropdown did not exist. In that release, you have to create a "Template Item" from that page. In the In
the Template Item, add a single metadata field "dspace.entity.type". Give it a value matching the Entity type (e.g. Publication, Person,
Project, OrgUnit, Journal, JournalVolume, JournalIssue). This value IS CASE SENSITIVE and it MUST match the Entity type name
defined in relationship-types.xml
i. As of 7.1 (or above) , if you previously created a Template Item in 7.0, the "dspace.entity.type" field value will be migrated to the
"Entity Type" dropdown automatically (via a database migration).
3. In the Edit Collection page, switch to the "Assign Roles" tab and create a "Submitters" group. Add any people who should be allowed to submit
/create this new Entity type.
a. If you only want Administrators to create this Entity type, you can skip this step. Administrators can submit to any Collection.
4. If you want to hide this Collection, you can choose to only make it visible to that same Submitters group (or Administrators). This does NOT hide
the Entities from search or browse, but it will hide the Collection itself.
a. In the Edit Collection page, switch to the "Authorizations" tab.
b. Add a new Authorization of TYPE_CUSTOM, restricting "READ" to the Submitters group created above (or Administrators if there is no
Submitters group). You can also add multiple READ policies as needed. WARNING: The Submitters group MUST have READ
privileges to be able to submit/create new Entities.
c. Remove the default READ policy giving Anonymous permissions.
d. Assuming you want the Entities to still be publicly available, make sure the DEFAULT_ITEM_READ policy is set to "Anonymous"!
Obviously, how you organize your Entity Types into Collections is up to you. You can create a single Collection for all Entities of that type (e.g. an "Author
Profiles" collection could be where all "Person" Entities are submitted/stored). Or, you could create many Collections for each Entity Type (e.g. each
Department in your University may have it's own Community, and underneath have a "Staff Profiles" Collection where all "Person" Entities for that
department are submitted/stored). A few example structures are shown below.
Department of Architecture
Building Technology Program
Theses - Department of Architecture
Department of Biology
Theses - Biology
People
Projects
OR
Department of Architecture
Building Technology Program
Theses - Department of Architecture
People in Department of Architecture
Projects in Department of Architecture
Department of Biology
Theses - Biology
People in Department of Biology
Projects in Department of Biology
Books
Book Chapter
Edited Volume
Monograph
Theses
Bachelor Thesis
Doctoral Thesis
Habilitation Thesis
Master Thesis
People
128
Projects
On the backend, you will now need to modify the [dspace]/config/item-submission.xml to "map" this Collection (or Collections) to the submission
process for this Entity type.
DSpace comes with sample submission forms for each Entity type.
The sample <submission-process> is defined in item-submission.xml and named based on the Entity type (e.g. Publication,
Person, Project, etc).
The metadata fields captured for each Entity are defined in a custom step in submission-forms.xml, and named in the format
"[entityType]Step" (where the entity type is camelcased). For example: "publicationStep", "personStep", "projectStep".
Optionally, modify those sample submission forms. See Submission User Interface for hints/tips on customizing the item-submission.xml or s
ubmission-forms.xml files
As of 7.6, you can simply map each Entity Type to a specific submission form as follows in your item-submission.xml (This section already
exists, just uncomment it)
WARNING: If you create a new Collection using a specific Entity Type, you must currently restart your servlet container (e.g. Tomcat) for
the submission form configuration to take effect for the new Collection. This is the result of a known bug where the Submission forms
are cached until the servlet container is restarted. See this issue ticket: https://github.com/DSpace/DSpace/issues/7985
In 7.5 and earlier, you needed to map each Collection's handle one by one to a Submission form in item-submission.xml. Map your
Collection's handle (findable on the Collection homepage) to the submission form you want it to use. In the below example, we've
mapped a single Collection to each of the out-of-the-box Entity types.
Once your modifications to the submission process are complete, you will need to quickly reboot Tomcat (or your servlet container) to reload the current
settings.
4.1 Use of collection-entity-type attribute for default Submission forms per Entity Type
Alternatively to a collection's Handle, Entities Types can be used as an attribute. So, instead of specifying the collection handle, you will need to use the co
llection-entity-type attribute and what Entity Type to use (like: Person, Project). Please mind that your Collections with Entity Type need to be
previously created.
Once your modifications to the submission process are complete, you will need to quickly reboot Tomcat (or your servlet container) to reload the current
settings.
129
For DSpace 7.6 release it requires Tomcat Restart for every new collection
Due to the way SubmissionConfigReader is loaded into memory (on a initialize process) currently there is no implemented way to reload submission forms.
So, every time you assign an entity type to a collection, or create a new collection with an associated entity type, you will need to do a Tomcat restart for
that collection to be available at the item submission config. There is an on going fix for that.
DSpace 7.6.1 introduced a fix and you don't need to do a Tomcat Restart anymore
DSpace 7.6.1 adds a way to reload Submission Configs, so you no longer need to do a Tomcat Restart after creating a new collection with an entity type,
or assigning to a existing one.
See Configurable Workflow for more information on configuring workflows per Collection.
Virtual Metadata is configurable for all Entities and all relationships. DSpace comes with default settings for its default Entity model, and those can be
found in [dspace]/config/spring/api/virtual-metadata.xml. In that Spring Bean configuration file, you'll find a map of each relationship type
to a metadata field & its value. Here's a summary of how it works:
The "org.dspace.content.virtual.VirtualMetadataPopulator" bean maps every Relationship type (from relationship-types.xml) to a <util:
map> definition (of a given ID) also in the virtual-metadata.xml
That <util:map> defintion defines which DSpace metadata field will store the virtual metadata. It also links to the bean which will dynamically
define the value of this metadata field.
<!-- In this example, isAuthorOfPublication will be displayed in the "dc.contributor.author" field -->
<!-- The *value* of that field will be defined by the "publicationAuthor_author" bean -->
<util:map id="isAuthorOfPublicationMap">
<entry key="dc.contributor.author" value-ref="publicationAuthor_author"/>
</util:map>
A bean of that ID then defines the value of the field, based on the related Entity. In this example, these fields are pulled from the related Person
entity and concatenated. If the Person has "person.familyName=Jones" and "person.givenName=Jane", then the value of "dc.contributor.author"
on the related Publication will be dynamically set to "Jones, Jane.
If the default Virtual Metadata looks good to you, no changes are needed. If you make any changes, be sure to restart Tomcat to update the bean
definitions.
130
Thinking about the object model
First step: identify the entity types
Which types of objects would you want to create items for: e.g. Person, Publication, JournalVolume
Be careful not to confuse a type with a relationship. A Person is a type, an author is a relationship between the publication and the person
Which relationship types would you want to create between the entity items from the previous step: e.g. isAuthorOfPublication,
isEditorOfPublication, isProjectOfPublication, isOrgUnitOfPerson, isJournalIssueOfPublication
Multiple relationships between the same 2 types can be created: isAuthorOfPublication, isEditorOfPublication
Relationships are automatically bidirectional, so no need to worry about whether you want to display the authors in a publication or the
publications of an author
By creating a drawing of your model, you’ll be able to quickly verify whether anything is missing
Similar to the default relationship-types.xml, configure a relationship type per connection between 2 entity types
Include the 2 entity type names which are being connected.
Determine a clear an unambiguous name for the relation in both directions
Optionally: determine the cardinality (min/max occurrences) for the relationships
Optionally: determine default behavior for copying metadata if the relationship is deleted
Dublin Core works for publications, but not for a Person, JournalVolume, …
There are many standards which can be easily configured: schema.org, eurocris, datacite, …
Pick a schema which suits your needs
Add a form in submission-forms.xml for each entity type, containing the relevant metadata fields
See also Submission User Interface documentation.
Configure which relationships to create
131
The isAuthorOfPublication relationship can be displayed for the Publication item as dc.contributor.author
The isOrgUnitOfPerson relationship can be displayed for the Person item as organization.legalName
This can be configured in virtual-metadata.xml
Configuring discovery
Configure the discovery facets, filters, sort options, …
The facets for a Person can be job title, organization, project, …
The filters for a Person can be person.familyName, person.givenName, …
We are working on pulling that information into this Wiki space as a final home, but currently some technical details exist only in that document.
Tilted relationships
The tilted relationships are a default DSpace 7 feature, developed in https://github.com/DSpace/DSpace/pull/3134
It’s designed to improve performance when an entity has 1000s of relationships. It will avoid loading the relationships in the configured direction unless
explicitly requested to retrieve them. It’s mainly used for setups where there are so many related items that it doesn’t make sense to list them all, and they
are rather made available via search. Using tilted relationships, those setups can get a big performance boost.
When setting the tilted to left , where the left entity type is Publication and the right entity type is OrgUnit , you are specifying that the Publication
should still load the related OrgUnit s (expected to be a small amount anyway), but the OrgUnit is not supposed to load all the related Publication s.
When checking the OrgUnit metadata, it won’t populate the relation.isPublicationOfOrgUnit virtual metadata
When checking the OrgUnit relationships, it won’t return the relationships based on this type
But using the relationship search angular component, the related Publication s can still be displayed on the OrgUnit item page
It is also still possible to use the relationship list angular component on the OrgUnit item page without a performance impact, but for the OrgUnit
entities, the pageable list doesn’t have much added value
When checking the Publication metadata, it will still populate the relation.isOrgUnitOfPublication virtual metadata, and can also
populate any other virtual metadata such as the OrgUnit name on the Publication
When checking the Publication relationships, it will include the relationships based on this type
The relationship list angular component on the Publication item page can be used, and won’t have any issues
In general:
Retrieving the item REST representation which includes virtual metadata when using the tilted relationships for items with 600 relationships
reduced the duration to 5% (so a 95% reduction in time spent)
Creating new relationships when using the tilted relationships for items with 1000 relationships or more reduced the duration as well, but more
recent tests revealed this reduced the duration to 25% (so a 75% reduction in time spent)
Using tilted relationships, there are production environments with single entities having over 40k relationships using a single relationship type. Using the
tilted relationships, this setup doesn’t impact the performance.
Using the OrgUnit to Person relationship, it’s possible to have an OrgUnit with 40k Person entities, and the virtual metadata of the Person entity contains
the Org Unit’s name, ID, ….
Versioning Support
DSpace entities fully support versioning. For the most part, this works like any other item. For example, when creating a new version of an item, a new
item is created and all metadata values of the preceding item are copied over to the new item. Special care was taken to version relationships between
entities.
132
To understand how versioning between entities with relationships works, let's walk through the following example:
Consider Volume 1.1 (left side) and Issue 1.1 (right side). Both are archived and both are the first version. Note that on the arrow, representing the relation
between the volume and the issue, two booleans and two numbers are indicated.
The boolean on side (v) is true if and only if volume 1.1 is the latest version that is relevant to issue 1 (even though it may be possible that
volume 1.2, the second version of volume 1, exists). This means that on the item page of issue 1.1, a link to the item page of volume 1.1 should
be displayed. It also means that searching for (the uuid of) issue 1.1 should yield volume 1.1.
The boolean on side (i) is true if and only if issue 1.1 is the latest version that is relevant to volume 1.1 (even though it may be possible that
issue 1.1, the second version of issue 1, exists). This means that on the item page of volume 1.1, a link to the item page of issue 1.1 should be
displayed. It also means that searching for (the uuid of) volume 1.1 should yield issue 1.1.
The number on side (v) indicates the place at which the virtual metadata representing this relationship (if any) will appear on volume 1.1. E.g.
using the out-of-the-box configuration in virtual-metadata.xml, metadata field publicationissue.issueNumber of issue 1.1 would
appear as metadata field publicationissue.issueNumber on volume 1.1 on place 0 (i.e. as the first metadata value).
The number on side (i) indicates the place at which the virtual metadata representing this relationship (if any) will appear on issue 1.1. E.g.
using the out-of-the-box configuration in virtual-metadata.xml, metadata field publicationvolume.volumeNumber of volume 1.1 would
appear as metadata field publicationvolume.volumeNumber on issue 1.1 on place 0 (i.e. as the first metadata value).
With the groundwork out of the way, let's see what happens when we create a new version of volume 1.1. The new version is not yet archived, because it
still has to be edited in the submission UI.
At this moment, when viewing the item page of issue 1.1, the user should only see volume 1.1 (as volume 1.2 is not yet archived). When viewing the item
page of volume 1.1, nothing has changed: only a link to issue 1.1 will appear. When viewing the item page of volume 1.2 (e.g. as an admin), a link to issue
1.1 will appear as well.
133
As soon as volume 1.2 is deposited (archived), the "latest status" of both volume 1.1 and volume 1.2 are updated. When viewing the item page of issue
1.1, volume 1.2 should be visible. When viewing the item pages of the volumes, nothing has changed.
134
Only the relationship with volume 1.3 is copied. For issue 1.1, no relationship was displayed with volume 1.1 and 1.2. (The relationships still exist in the
database, but are not visible in the UI.). For volume 1.1, a relationship to issue 1.1 remains present, but it should not be updated to issue 1.2. For issue
1.2, these relationships are longer relevant, so they are not copied.
On the item pages of volume 1.1, volume 1.2 and volume 1.3, you should see issue 1.1 (as 1.2 is not archived yet)
Because issue 1.2 is not yet archived, all volumes are still pointing to issue 1.1. Let's archive it:
Now on the item pages of volume 1.1 and volume 1.2, you should see issue 1.1; it's the latest issue at the time that those volumes were superseded by
volume 1.3. On the item page of volume 1.3, you'll see issue 1.3. On the item page of issue 1.1 you'll still see volume 1.3 as well.
Metadata fields of the first category (relation.*) contain all uuids of related items that the current item can see. I.e. a relationship has to exist between
the current item and the other item, and the other item needs to have "latest status" for that specific relationship.
135
Item issue 1.1 will contain metadata field relation.isJournalVolumeOfIssue with as value the uuid of volume 1.3. Volume 1.1 and 1.2 are not
included because they don't have "latest status" on the relevant relationships.
Metadata fields of the second category (relation.*.latestForDiscovery) contain all uuids of the items for which the current item is visible. I.e. a
relationship has to exist between the current item and the other item, and the current item needs to have "latest status" for that specific relationship. These
fields are particularly important for indexing and search, because they allow to us to surface all the items that a particular item is referring to.
Continuing on the example above, issue 1.1 will have metadata field relation.isJournalVolumeOfIssue.latestForDiscovery containing the
uuids of volume 1.1 and 1.2.
With issue 1.1 containing volume 1.1 and 1.2 in relation.isJournalVolumeOfIssue.latestForDiscovery, a search on the volume 1.1 page for
all issues containing volume 1.1 will display issue 1.1 thanks to this setup.
1. when introducing a relationship type, make sure to add four new metadata fields to config/registries/relationship-formats.xml. E.g.
relation.isAuthorOfPublication, relation.isAuthorOfPublication.latestForDiscovery, relation.
isPublicationOfAuthor and relation.isPublicationOfAuthor.latestForDiscovery
2. when introducing an entity type, filter items on latestVersion:true in discovery.xml. This will be the default search, which ensures older
versions are not shown
If you want to show all related items, including older versions, you can create another discovery config without latestVersion:true.
This should be used for item pages displaying the related items to the current item using the discovery search.
The entity types configured out-of-the-box have discovery config <entity-type> and discovery config <entity-
type>Relationships for that purpose.
Note that versioning support is enabled by default, but can be turned off by setting versioning.enabled = false in versioning.cfg or local.cfg
. For more details on item versioning, see: https://wiki.lyrasis.org/display/DSDOC7x/Item+Level+Versioning.
136
Curation System
DSpace supports running curation tasks, which are described in this section. DSpace includes several useful tasks out-of-the-box, but the system also is
designed to allow new tasks to be added between releases, both general purpose tasks that come from the community, and locally written and deployed
tasks.
1 Tasks
2 Activation
3 Task Invocation
3.1 On the command line
3.2 In the admin UI
3.3 In workflow
3.4 In arbitrary user code
4 Asynchronous (Deferred) Operation
5 Task Output and Reporting
5.1 Status Code
5.2 Result String
5.3 Reporting Stream
6 Task Properties
7 Task Parameters
8 Scripted Tasks
Tasks
The goal of the curation system ("CS") is to provide a simple, extensible way to manage routine content operations on a repository. These operations are
known to CS as "tasks", and they can operate on any DSpaceObject (i.e. subclasses of DSpaceObject) - which means the entire Site, Communities,
Collections, and Items - viz. core data model objects. Tasks may elect to work on only one type of DSpace object - typically an Item - and in this case they
may simply ignore other data types (tasks have the ability to "skip" objects for any reason). The DSpace core distribution will provide a number of useful
tasks, but the system is designed to encourage local extension - tasks can be written for any purpose, and placed in any java package. This gives DSpace
sites the ability to customize the behavior of their repository without having to alter - and therefore manage synchronization with - the DSpace source code.
What sorts of activities are appropriate for tasks?
Some examples:
apply a virus scan to item bitstreams (this will be our example below)
profile a collection based on format types - good for identifying format migrations
ensure a given set of metadata fields are present in every item, or even that they have particular values
call a network service to enhance/replace/normalize an item's metadata or content
ensure all item bitstreams are readable and their checksums agree with the ingest values
Since tasks have access to, and can modify, DSpace content, performing tasks is considered an administrative function to be available only to
knowledgeable collection editors, repository administrators, sysadmins, etc. No tasks are exposed in the public interfaces.
Activation
For CS to run a task, the code for the task must of course be included with other deployed code (to [dspace]/lib, WAR, etc) but it must also be
declared and given a name. This is done via a configuration property in [dspace]/config/modules/curate.cfg as follows:
For each activated task, a key-value pair is added. The key is the fully qualified class name and the value is the taskname used elsewhere to configure the
use of the task, as will be seen below. Note that the curate.cfg configuration file, while in the config directory, is located under "modules". The intent is that
tasks, as well as any configuration they require, will be optional "add-ons" to the basic system configuration. Adding or removing tasks has no impact on
dspace.cfg.
For many tasks, this activation configuration is all that will be required to use it. But for others, the task needs specific configuration itself. A concrete
example is described below, but note that these task-specific configuration property files also reside in [dspace]/config/modules
Task Invocation
Tasks are invoked using CS framework classes that manage a few details (to be described below), and this invocation can occur wherever needed, but CS
offers great versatility "out of the box":
option meaning
-e epersonID (required) email address or netid of the E-Person performing the task
-i identifier ID of object to curate. May be (1) a Handle, (2) a workflow ID, or (3) 'all' to operate on the whole repository.
-l limit maximum number of objects in Context cache. If absent, unlimited objects may be added.
-s scope declare a scope for database transactions. Scope must be: (1) 'open' (default value), (2) 'curation' or (3) 'object'.
-r filename emit reporting to the named file. '-r -' writes reporting to standard out. If not specified, report is discarded silently.
-p name=value set a runtime task parameter name to the value value. May be repeated as needed. See "Task parameters" below.
As with other command-line tools, these invocations could be placed in a cron table and run on a fixed schedule, or run on demand by an administrator.
In the admin UI
In the UI, there are several ways to execute configured Curation Tasks:
1. From the "Curate" tab/button that appears on each "Edit Community/Collection/Item" page: this tab allows an Administrator, Community
Administrator or Collection Administrator to run a Curation Task on that particular Community, Collection or Item. When running a task on a
Community or Collection, that task will also execute on all its child objects, unless the Task itself states otherwise (e.g. running a task on a
Collection will also run it across all Items within that Collection).
NOTE: Community Administrators and Collection Administrators can only run Curation Tasks on the Community or Collection which they
administer, along with any child objects of that Community or Collection. For example, a Collection Administrator can run a task on that
specific Collection, or on any of the Items within that Collection.
2. From the Administrator's "Curation Tasks" page: This option is only available to DSpace Administrators, and appears in the Administrative
side-menu. This page allows an Administrator to run a Curation Task across a single object, or all objects within the entire DSpace site.
In order to run a task from this interface, you must enter in the handle for the DSpace object. To run a task site-wide, you can use the
handle: [your-handle-prefix]/0
Each of the above pages exposes a drop-down list of configured tasks, with a button to 'perform' the task, or queue it for later operation (see section
below). Not all activated tasks need appear in the Curate tab - you filter them by means of a configuration property. This property also permits you to
assign to the task a more user-friendly name than the PluginManager taskname. The property resides in [dspace]/config/modules/curate.cfg:
When a task is selected from the drop-down list and performed, the tab displays both a phrase interpreting the "status code" of the task execution, and the
"result" message if any has been defined. When the task has been queued, an acknowledgement appears instead. You may configure the words used for
status codes in curate.cfg (for clarity, language localization, etc):
Report output from tasks run in this way is collected by configuring a Reporter plugin. You must have exactly one Reporter configured. The default is to
use the FileReporter, which writes a single report of the output of all tasks in the run over all of the selected objects, to a file in the reports directory
(configured as report.dir). See [DSpace]/config/modules/submission-configuration.cfg for the value of plugin.single.org.dspace.
curate.Reporter. Other Reporter implementations are provided, or you may supply your own.
138
As the number of tasks configured for a system grows, a simple drop-down list of all tasks may become too cluttered or large. DSpace 1.8+ provides a
way to address this issue, known as task groups. A task group is a simple collection of tasks that the Admin UI will display in a separate drop-down list.
You may define as many or as few groups as you please. If no groups are defined, then all tasks that are listed in the ui.tasknames property will appear in
a single drop-down list. If at least one group is defined, then the admin UI will display two drop-down lists. The first is the list of task groups, and the
second is the list of task names associated with the selected group. A few key points to keep in mind when setting up task groups:
The configuration of groups follows the same simple pattern as tasks, using properties in [dspace]/config/modules/curate.cfg. The group is
assigned a simple logical name, but also a localizable name that appears in the UI. For example:
# ui.taskgroups contains the list of defined groups, together with a pretty name for UI display
curate.ui.taskgroups = replication = Backup and Restoration Tasks
curate.ui.taskgroups = integrity = Metadata Integrity Tasks
.....
# each group membership list is a separate property, whose value is comma-separated list of logical task names
curate.ui.taskgroup.integrity = profileformats, requiredmetadata
....
In workflow
CS provides the ability to attach any number of tasks to standard DSpace workflows. Using a configuration file [dspace]/config/workflow-
curation.xml, you can declaratively (without coding) wire tasks to any step in a workflow. An example:
<taskset-map>
<mapping collection-handle="default" taskset="cautious" />
</taskset-map>
<tasksets>
<taskset name="cautious">
<flowstep name="editstep">
<task name="vscan">
<workflow>reject</workflow>
<notify on="fail">$flowgroup</notify>
<notify on="fail">$colladmin</notify>
<notify on="error">$siteadmin</notify>
</task>
</flowstep>
</taskset>
</tasksets>
This markup would cause a virus scan to occur during the "editstep" of workflow for any collection, and automatically reject any submissions with infected
files. It would further notify (via email) both the reviewers ("editstep" role/group), and the collection administrators, if either of these are defined. If it could
not perform the scan, the site administrator would be notified.
The notifications use the same procedures that other workflow notifications do - namely email. There is a new email template defined for curation task use:
[dspace]/config/emails/flowtask_notify. This may be language-localized or otherwise modified like any other email template.
Tasks wired in this way are normally performed as soon as the workflow step is entered, and the outcome action (defined by the 'workflow' element)
immediately follows. It is also possible to delay the performance of the task - which will ensure a responsive system - by queuing the task instead of
directly performing it:
...
<taskset name="cautious">
<flowstep name="editstep" queue="workflow">
...
This attribute (which must always follow the "name" attribute in the flowstep element), will cause all tasks associated with the step to be placed on the
queue named "workflow" (or any queue you wish to use, of course), and further has the effect of suspending the workflow. When the queue is emptied
(meaning all tasks in it performed), then the workflow is restarted. Each workflow step may be separately configured,
Like configurable submission, you can assign these task rules per collection, as well as having a default for any collection.
As with task invocation from the administrative UI, workflow tasks need to have a Reporter configured in submission-configuration.cfg.
139
Collection coll = (Collection)HandleManager.resolveToObject(context, "123456789/4");
Curator curator = new Curator();
curator.setReporter(System.out);
curator.addTask("vscan").curate(coll);
System.out.println("Result: " + curator.getResult("vscan"));
would do approximately what the command line invocation did. the method "curate" just performs all the tasks configured (you can add multiple tasks to a
curator).
The above directs report output to standard out. Any class which implements Appendable may be set as the reporter class.
would place a request on a named queue "monthly" to virus scan the collection. To read (and process) the queue, we could for example:
use the command-line tool, but we could also read the queue programmatically. Any number of queues can be defined and used as needed.
In the administrative UI curation "widget", there is the ability to both perform a task, but also place it on a queue for later processing.
Status Code
This was mentioned above. This is returned to CS whenever a task is called. The complete list of values:
In the administrative UI, this code is translated into the word or phrase configured by the ui.statusmessages property (discussed above) for display.
Result String
The task may define a string indicating details of the outcome. This result is displayed, in the "curation widget" described above:
CS does not interpret or assign result strings, the task does it. A task may not assign a result, but the "best practice" for tasks is to assign one whenever
possible.
Reporting Stream
For very fine-grained information, a task may write to a reporting stream. This stream may be sent to a file or to standard out, when running a task from the
command line. Tasks run from the administrative UI or a workflow use a configured Reporter class to collect report output. Your own code may collect the
report using any implementation of the Appendable interface. Unlike the result string, there is no limit to the amount of data that may be pushed to this
stream.
Task Properties
140
DSpace 1.8 introduces a new "idiom" for tasks that require configuration data. It is available to any task whose implementation extends AbstractCuratio
nTask, but is completely optional. There are a number of problems that task properties are designed to solve, but to make the discussion concrete we will
start with a particular one: the problem of hard-coded configuration file names. A task that relies on configuration data will typically encode a fixed
reference to a configuration file name. For example, the virus scan task reads a file called "clamav.cfg", which lives in [dspace]/config/modules. It
could look up its configuration properties in the ordinary way. But tasks are supposed to be written by anyone in the community and shared around
(without prior coordination), so if another task uses the same configuration file name, there is a name collision here that can't be easily fixed, since the
reference is hard-coded in each task. In this case, if we wanted to use both at a given site, we would have to alter the source of one of them - which
introduces needless code localization and maintenance.
Task properties gives us a simple solution. Here is how it works: suppose that both colliding tasks instead use the task properties facility instead of
ordinary configuration lookup. For example, each asks for the property clamav.service.host. At runtime, the curation system resolves this request
to a set of configuration properties, and it uses the name the task has been configured as as the prefix of the properties. So, for example, if both were
installed (in, say, curate.cfg) as:
org.dspace.ctask.general.ClamAv = vscan,
org.community.ctask.ConflictTask = virusscan,
....
then the task property foo will resolve to the property named vscan.foo when called from ClamAv task, but virusscan.foo when called from
ConflictTask's code. Note that the "vscan" etc are locally assigned names, so we can always prevent the "collisions" mentioned, and we make the tasks
much more portable, since we remove the "hard-coding" of config names.
Another use of task properties is to support multiple task profiles. Suppose we have a task that we want to operate in one of two modes. A good example
would be a mediafilter task that produces a thumbnail. We can either create one if it doesn't exist, or run with "-force" which will create one regardless.
Suppose this behavior was controlled by a property in a config file. If we configured the task as "thumbnail", then we would have in (perhaps) [dspace]
/config/modules/thumbnail.cfg:
...other properties...
thumbnail.thumbnail.maxheight = 80
thumbnail.thumbnail.maxwidth = 80
thumbnail.forceupdate=false
The thumbnail generating task code would then resolve "forcedupdate" to see whether filtering should be forced.
But an obvious use-case would be to want to run force mode and non-force mode from the admin UI on different occasions. To do this, one would have to
stop Tomcat, change the property value in the config file, and restart, etc However, we can use task properties to elegantly rescue us here. All we need to
do is go into the config/modules directory, and create a new file perhaps called: thumbnail.force.cfg. In this file, we put the properties:
thumbnail.force.thumbnail.maxheight = 80
thumbnail.force.thumbnail.maxwidth = 80
thumbnail.force.forceupdate=true
Then we add a new task (really just a new name, no new code) in curate.cfg:
org.dspace.ctask.general.ThumbnailTask = thumbnail
org.dspace.ctask.general.ThumbnailTask = thumbnail.force
Consider what happens: when we perform the task "thumbnail" (using taskProperties), it uses the thumbnail.* properties and operates in "non-force"
profile (since the value is false), but when we run the task "thumbnail.force" the curation system uses the thumbnail.force.* properties. Notice
that we did all this via local configuration - we have not had to touch the source code at all to obtain as many "profiles" as we would like.
See Task Properties in Curation Tasks for details of how properties are resolved in task code.
Task Parameters
New in DSpace 7, you can pass parameters to a task at invocation time. These runtime parameters will be presented to the task as if they were task
properties (see above) and, if present, will override the value of identically-named properties. Example:
Task parameters
Scripted Tasks
The procedure to set up curation tasks in Jython is described on a separate page: Curation tasks in Jython
141
DSpace 1.8 includes limited (and somewhat experimental) support for deploying and running tasks written in languages other than Java. Since version 6,
Java has provided a standard way (API) to invoke so-called scripting or dynamic language code that runs on the java virtual machine (JVM). Scripted tasks
are those written in a language accessible from this API. The exact number of supported languages will vary over time, and the degree of maturity of each
language, or suitability of the language for curation tasks will also vary significantly. However, preliminary work indicates that Ruby (using the JRuby
runtime) and Groovy may prove viable task languages.
Support for scripted tasks does not include any DSpace pre-installation of the scripting language itself - this must be done according to the instructions
provided by the language maintainers, and typically only requires a few additional jars on the DSpace classpath. Once one or more languages have been
installed into the DSpace deployment, task support is fairly straightforward. One new property must be defined in [dspace]/config/modules/curate.
cfg:
curate.script.dir = ${dspace.dir}/scripts
This merely defines the directory location (usually relative to the deployment base) where task script files should be kept. This directory will contain a
"catalog" of scripted tasks named task.catalog that contains information needed to run scripted tasks. Each task has a 'descriptor' property with value
syntax:
<engine>|<relFilePath>|<implClassCtor>
An example property for a link checking task written in Ruby might be:
linkchecker = ruby|rubytask.rb|LinkChecker.new
This descriptor means that a "ruby" script engine will be created, a script file named "rubytask.rb" in the directory <script.dir> will be loaded and
the resolver will expect an evaluation of "LinkChecker.new" will provide a correct implementation object. Note that the task must be configured in all
other ways just like java tasks (in ui.tasknames, ui.taskgroups, etc).
Script files may embed their descriptors to facilitate deployment. To accomplish this, a script must include the descriptor string with syntax:
$td=<descriptor> somewhere on a comment line. For example:
# My descriptor $td=ruby|rubytask.rb|LinkChecker.new
For reasons of portability, the <relFilePath> component may be omitted in this context. Thus, "$td=ruby||LinkChecker.new" will be expanded to a
descriptor with the name of the embedding file.
142
Bundled Tasks
DSpace bundles a small number of tasks of general applicability. Those that do not require configuration (or have usable default values) are activated by
default to demonstrate the use of the curation system. They may be deactivated by means of configuration, if desired, without affecting system integrity.
Those that require configuration may be enabled (activated) by means editing DSpace configuration files. Each task is briefly described in this section.
All bundled tasks are in the package org.dspace.ctask.general. So, for example, to activate the no-operation task, which is implemented in the
class NoOpCurationTask, one would configure:
143
Bitstream Format Profiler Task
The task with the taskname 'formatprofiler' (in the admin UI it is labeled "Profile Bitstream Formats") examines all the bitstreams in an item and produces a
table ("profile") which is assigned to the result string. It is activated by default, and is configured to display in the administrative UI. The result string has the
layout:
where the left column is the count of bitstreams of the named format and the letter in parentheses is an abbreviation of the repository-assigned support
level for that format:
U Unsupported
K Known
S Supported
The profiler will operate on any DSpace object. If the object is an item, then only that item's bitstreams are profiled; if a collection, all the bitstreams of all
the items; if a community, all the items of all the collections of the community.
144
Link Checker Tasks
Two link checker tasks, BasicLinkChecker and MetadataValueLinkChecker, can be used to check for broken or unresolvable links appearing in item
metadata.
This task is intended as a prototype / example for developers and administrators who are new to the curation system.
145
MetadataWebService Task
DSpace item metadata can contain any number of identifiers or other field values that participate in networked information systems. For example, an item
may include a DOI which is a controlled identifier in the DOI registry. Many web services exist to leverage these values, by using them as 'keys' to retrieve
other useful data. In the DOI case for example, CrossRef provides many services that given a DOI will return author lists, citations, etc. The
MetadataWebService task enables the use of such services, and allows you to obtain and (optionally) add to DSpace metadata the results of any web
service call to any service provider. You simply need to describe what service you want to call, and what to do with the results. Using the task code ([task
code]), you can create as many distinct tasks as you have services you want to call.
Each task description lives in a configuration file in 'config/modules' (or in your local.cfg), and is a simple properties file, like all other DSpace configuration
files (see Configuration Reference). All of the settings associated with a given task should be prepended with the task name (as assigned in config
/modules/curate.cfg). For example, if the task name is issn2pubname in curate.cfg, then all settings should start with "issn2pubname." Your
settings can either be set in your local.cfg , or in a new configuration file which is included (include = path/to/new/file.cfg) into either your
local.cfg or the dspace.cfg. See the Configuration Reference for examples of including configuration files, or modifying your local.cfg
There are a few required properties you must configure for any service, and for certain services, a few additional ones. An example will illustrate best.
[taskcode].template=http://www.sherpa.ac.uk/romeo/api29.php?issn={dc.identifier.issn}
When the task runs, it will replace '{dc.identifier.issn}' with the value of that field in the item, If the field has multiple values, the first one will be used. As a
web service, the call to the above URL will return an XML document containing information (including the publisher name) about that ISSN. We need to
describe what to do with this response document, i.e. what elements we want to extract, and what to do with the extracted content. This description is
encoded in a property called the 'datamap'. Using the example service above we might have:
[taskcode].datamap=//publisher/name=>dc.publisher,//romeocolor
Each separate instruction is separated by a comma, so there are 2 instructions in this map. The first instruction essentially says: find the XML element
'publisher name' and assign the value or values of this element to the 'dc.publisher' field of the item. The second instruction says: find the XML element
'romeocolor', but do not add it to the DSpace item metadata - simply add it to the task result string (so that it can be seen by the person running the task).
You can have as many instructions as you like in a datamap, which means that you can retrieve multiple values from a single web service call. A little more
formally, each instruction consists of one to three parts. The first (mandatory) part identifies the desired data in the response document. The syntax (here '
//publisher/name') is an XPath 1.0 expression, which is the standard language for navigating XML trees. If the value is to be assigned to the DSpace item
metadata, then 2 other parts are needed. The first is the 'mapping symbol' (here '=>'), which is used to determine how the assignment should be made.
There are 3 possible mapping symbols, shown here with their meanings:
'->' mapping will add to any existing value(s) in the item field
'=>' mapping will replace any existing value(s) in the item field
'~>' mapping will add *only if* item field has no existing value(s)
The third part (here 'dc.publisher') is simply the name of the metadata field to be updated. These two mandatory properties (template and datamap) are
sufficient to describe a large number of web services. All that is required to enable this task is to edit 'config/modules/curate.cfg' (or your local.
cfg), and add 'issn2pubname' to the list of tasks:
If you wish the task to be available in the Admin UI, see the Invocation from the Admin UI documentation (above) about how to configure it. The remaining
sections describe some more specialized needs using the MetadataWebService task.
HTTP Headers
For some web services, protocol and other information is expressed not in the service URL, but in HTTP headers. Examples might be HTTP basic auth
tokens, or requests for a particular media type response. In these cases, simply add a property to the configuration file (our example was 'issn2pubname.
cfg') containing all headers you wish to transmit to the service:
146
You can specify any number of headers, just separate them with a 'double-pipe' ('||'). Ensure that any commas in values are escaped (with backslash
comma, i.e. '\,').
Transformations
One potential problem with the simple parameter substitutions performed by the task is that the service might expect a different format or expression of a
value than the way it is stored in the item metadata. For example, a DOI service might expect a bare prefix/suffix notation ('10.000/12345'), whereas the
DSpace metadata field might have a URI representation ('http://dx.doi.org/10.000/12345'). In these cases one can declare a 'transformation' of a value in
the template. For example:
[taskcode].template=http://www.crossref.org/openurl/?id={doi:dc.relation.isversionof}&format=unixref
The 'doi:' prepended to the metadata field name declares that the value of the 'dc.relation.isversionof' field should be transformed before the substitution
into the template using a transformation named 'doi'. The transformation is itself defined in the same configuration file as follows:
This would be read as: exclude the value string up to the occurrence of '10.', then truncate any characters after length 60. You may define as many
transformations as you want in any task, although generally 1 or 2 will suffice. They keywords 'match', 'trunc', etc are names of 'functions' to be applied (in
the order entered). The currently available functions are:
When the task is run, if the transformation results in an invalid state (e.g. cutting more characters than there are in the value), the un-transformed value will
be used and the condition will be logged. Transformations may also be applied to values returned from the web service. That is, one can apply the
transformation to a value before assigning it to a metadata field. In this case, the declaration occurs in the datamap property, not the template:
[taskcode].datamap=//publisher/name=>shorten:dc.publisher,//romeocolor
Here the task will apply the 'shorten' transformation (which must be defined in the same config file) before assigning the value to 'dc.publisher'.
[taskcode].separator=||
for example, it becomes easy to parse the result string and preserve spaces in the values. This use of the result string can be very powerful, since you are
essentially creating a map of returned values, which can then be used to populate a user interface, or any other way you wish to exploit the data (drive a
workflow, etc).
147
MicrosoftTranslator Task
Microsoft Translator uses the Microsoft Translate API to translate metadata values from one source language into one or more target languages.
This task cab be configured to process particular fields, and use a default language if no authoritative language for an item can be found. Bing API v2 key
is needed.
MicrosoftTranslator extends the more generic AbstractTranslator. This now seems wasteful, but a GoogleTranslator had also been written to extend
AbstractTranslator. Unfortunately, Google has announced they are decommissioning free Translate API service, so this task hasn't been included in
DSpace's general set of curation tasks.
Translated fields are added in addition to any existing fields, with the target language code in the 'language' column. This means that running a task
multiple times over one item with the same configuration could result in duplicate metadata.
This task is intended as a prototype / example for developers and administrators who are new to the curation system.
#---------------------------------------------------------------#
#----------TRANSLATOR CURATION TASK CONFIGURATIONS--------------#
#---------------------------------------------------------------#
# Configuration properties used solely by MicrosoftTranslator #
# Curation Task (uses Microsoft Translation API v2) #
#---------------------------------------------------------------#
## Translation field settings
##
## Authoritative language field
## This will be read to determine the original language an item was submitted in
## Default: dc.language
translator.field.language = dc.language
148
NoOp Task
This task does absolutely nothing. It is intended as a starting point for developers and administrators wishing to learn more about the curation system.
149
Required Metadata Task
The "requiredmetadata" task examines item metadata and determines whether fields that the web submission (input-forms.xml) marks as required
are present. It sets the result string to indicate either that all required fields are present, or constructs a list of metadata elements that are required but
missing. When the task is performed on an item, it will display the result for that item. When performed on a collection or community, the task be performed
on each item, and will display the last item result. If all items in the community or collection have all required fields, that will be the last in the collection. If
the task fails for any item (i.e. the item lacks all required fields), the process is halted. This way the results for the 'failed' items are not lost.
150
Virus Scan Task
The "vscan" task performs a virus scan on the bitstreams of items using the ClamAV software product.
Clam AntiVirus is an open source (GPL) anti-virus toolkit for UNIX. A port for Windows is also available. The virus scanning curation task interacts with the
ClamAV virus scanning service to scan the bitstreams contained in items, reporting on infection(s). Like other curation tasks, it can be run against a
container or item, in the GUI or from the command line. It should be installed according to the documentation at http://www.clamav.net. It should not be
installed in the dspace installation directory. You may install it on the same machine as your dspace installation, or on another machine which has been
configured properly.
NOTICE: The following directions assume there is a properly installed and configured clamav daemon. Refer to links above for more information about
ClamAV.
The Clam anti-virus database must be updated regularly to maintain the most current level of anti-virus protection. Please refer to the ClamAV
documentation for instructions about maintaining the anti-virus database.
DSpace Configuration
In [dspace]/config/modules/curate.cfg, activate the task:
Optionally, add the vscan friendly name to the configuration to enable it in the administrative it in the administrative user interface.
clamav.service.host = 127.0.0.1
# Change if not running on the same host as your DSpace installation.
clamav.service.port = 3310
# Change if not using standard ClamAV port
clamav.socket.timeout = 120
# Change if longer timeout needed
clamav.scan.failfast = false
# Change only if items have large numbers of bitstreams
Finally, if desired virus scanning can be enabled as part of the submission process upload file step. In [dspace]/config/modules, edit
configuration file submission-curation.cfg:
submission-curation.virus-scan = true
151
1. Click on the curation tab.
2. Select the option configured in ui.tasknames above.
3. Select Perform.
submission-curation.virus-scan = true
Command Line
Container T Report on 1st infected bitstream within an item/Scan all contained Items
152
Exporting Content and Metadata
General top level page to group all DSpace facilities for exporting content and metadata.
153
Linked (Open) Data
Introduction
Exchanging repository contents
Terminology
Linked (Open) Data Support within DSpace
Architecture / Concept
Install a Triple Store
Default configuration and what you should change
Configuration Reference
[dspace-source]/dspace/config/modules/rdf.cfg
[dspace-source]/dspace/config/modules/rdf/constant-data-*.ttl
[dspace-source]/dspace/config/modules/rdf/metadata-rdf-mapping.ttl
[dspace-source]/dspace/config/modules/rdf/fuseki-assembler.ttl
[dspace-source]/dspace/config/spring/api/rdf.xml
Maintenance
Introduction
Exchanging repository contents
Most sites on the Internet are oriented towards human consumption. While HTML may be a good format for presenting information to humans, it is not a
good format to export data in a way easy for a computer to work with. Like most software for building repositories, DSpace supports OAI-PMH as an
interface to expose the stored metadata. While OAI-PMH is well known in the field of repositories, it is rarely known elsewhere (e.g. Google retired its
support for OAI-PMH in 2008). The Semantic Web is a generic approach to publish data on the Internet together with information about its semantics. Its
application is not limited to repositories or libraries and it has a growing user base. RDF and SPARQL are W3C-released standards for publishing
structured data on the web in a machine-readable way. The data stored in repositories is particularly suited for use in the Semantic Web, as the metadata
are already available. It doesn’t have to be generated or entered manually for publication as Linked Data. For most repositories, at least for Open Access
repositories, it is quite important to share their stored content. Linked Data is a rather big chance for repositories to present their content in a way that can
easily be accessed, interlinked and (re)used.
Terminology
We don't want to give a full introduction into the Semantic Web and its technologies here as this can be easily found in many places on the web.
Nevertheless, we want to give a short glossary of the terms used most often in this context to make the following documentation more readable.
Semantic The term "Semantic Web" refers to the part of the Internet containing Linked Data. Just like the World Wide Web, the Semantic Web is
Web also woven together by links among the data.
Linked Data in RDF, following the Linked Data Principles are called Linked Data. The Linked Data Principles describe the expected behavior of
Data data publishers who shall ensure that the published data are easy to find, easy to retrieve, can be linked easily and link to other data as
well.
Linked
Open Linked Open Data is Linked Data published under an open license. There is no technical difference between Linked Data and Linked
Data Open Data (often abbreviated as LOD). It is only a question of the license used to publish it.
RDF RDF is an acronym for Resource Description Framework, a metadata model. Don't think of RDF as a format, as it is a model.
RDF/XML Nevertheless, there are different formats to serialize data following RDF. RDF/XML, Turtle, N-Triples and N3-Notation are probably the
Turtle most well-known formats to serialize data in RDF. While RDF/XML uses XML, Turtle, N-Triples and N3-Notation don't and they are
N-Triples easier for humans to read and write. When we use RDF in DSpace configuration files, we currently prefer Turtle (but the code should be
N3- able to deal with any serialization).
Notation
Triple A triple store is a database to natively store data following the RDF model. Just as you have to provide a relational database for DSpace,
Store you have to provide a Triple Store for DSpace if you want to use the LOD support.
SPARQL The SPARQL Protocol and RDF Query Language is a family of protocols to query triple stores. Since version 1.1, SPARQL can be used
to manipulate triple stores as well, to store, delete or update data in triple stores. DSpace uses SPARQL 1.1 Graph Store HTTP Protocol
and SPARQL 1.1 Query Language to communicate with the Triple Store. The SPARQL 1.1 Query Language is often referred to simply
as SPARQL, so expect the SPARQL 1.1 Query Language if no other specific protocol out of the SPARQL family is explicitly specified.
SPARQL A SPARQL endpoint is a SPARQL interface of a triple store. Since SPARQL 1.1, a SPARQL endpoint can be either read-only, allowing o
endpoint nly to query the stored data; or readable and writable, allowing to modify the stored data as well. When talking about a SPARQL
endpoint without specifying which SPARQL protocol is used, an endpoint supporting SPARQL 1.1 Query Language is meant.
154
Architecture / Concept
To publish content stored in DSpace as Linked (Open) Data, the data have to be converted into RDF. The conversion into RDF has to be configurable as
different DSpace instances may use different metadata schemata, different persistent identifiers (DOI, Handle, ...) and so on. Depending on the content to
convert, configuration and other parameters, conversion may be time-intensive and impact performance. Content of repositories is much more often read
then created, deleted or changed because the main goal of repositories is to safely store their contents. For this reason, the content stored within DSpace
is converted and stored in a triple store immediately after it is created or updated. The triple store serves as a cache and provides a SPARQL endpoint to
make the converted data accessible using SPARQL. The conversion is triggered automatically by the DSpace event system and can be started manually
using the command line interface – both cases are documented below. There is no need to backup the triple store, as all data stored in the triple store can
be recreated from the contents stored elsewhere in DSpace (in the assetstore(s) and the database). Beside the SPARQL endpoint, the data should be
published as RDF serialization as well. With dspace-rdf DSpace offers a module that loads converted data from the triple store and provides it as an RDF
serialization. It currently supports RDF/XML, Turtle and N-Triples.
Repositories use Persistent Identifiers to make content citable and to address content. Following the Linked Data Principles, DSpace uses a Persistent
Identifier in the form of HTTP(S) URIs, converting a Handle to http://hdl.handle.net/<handle> and a DOI to http://dx.doi.org/<doi>. Altogether, DSpace Linke
d Data support spans all three Layers: the storage layer with a triple store, the business logic with classes to convert stored contents into RDF, and the
application layer with a module to publish RDF serializations. Just like DSpace allows you to choose Oracle or Postgresql as the relational database, you
may choose between different triple stores. The only requirements are that the triple store must support SPARQL 1.1 Query Language and SPARQL 1.1
Graph Store HTTP Protocol which DSpace uses to store, update, delete and load converted data in/out of the triple store and uses the triple store to
provide the data over a SPARQL endpoint.
The triple store should contain only data that are public, because the DSpace access restrictions won't affect the SPARQL endpoint. For this reason,
DSpace converts only archived, discoverable (non-private) Items, Collections and Communities which are readable for anonymous users. Please consider
this while configuring and/or extending DSpace Linked Data support.
The org.dspace.rdf.conversion package contains the classes used to convert the repository content to RDF. The conversion itself is done by plugins. The or
g.dspace.rdf.conversion.ConverterPlugin interface is really simple, so take a look at it you if can program in Java and want to extend the conversion. The
only thing important is that plugins must only create RDF that can be made publicly available, as the triple store provides it using a sparql endpoint for
which the DSpace access restrictions do not apply. Plugins converting metadata should check whether a specific metadata field needs to be protected or
not (see org.dspace.app.util.MetadataExposure on how to check that). The MetadataConverterPlugin is heavily configurable (see below) and is used to
convert the metadata of Items. The StaticDSOConverterPlugin can be used to add static RDF Triples (see below). The SimpleDSORelationsConverterPlugin
creates links between items and collections, collections and communities, subcommunitites and their parents, and between top-level communities and the
information representing the repository itself.
As different repositories uses different persistent identifiers to address their content, different algorithms to create URIs used within the converted data can
be implemented. Currently HTTP(S) URIs of the repository (called local URIs), Handles and DOIs can be used. See the configuration part of this document
for further information. If you want to add another algorithm, take a look at the org.dspace.rdf.storage.URIGenerator interface.
Make Fuseki connect to localhost only, by using the argument --localhost when launching if you use the configuration provided with DSpace! The
configuration contains a writeable SPARQL endpoint that allows any connection to change/delete the content of your triple store.
Use Apache mod proxy, mod rewrite or any other appropriate web server/proxy to make localhost:3030/dspace/sparql readable from the internet. Use the
address under which it is accessible as the address of your public sparql endpoint (see the property public.sparql.endpoint in the configuration reference
below.).
The configuration provided within DSpace makes it store the files for the triple store under [dspace-install]/triplestore. Using this configuration, Fuseki
provides three SPARQL endpoints: two read-only endpoints and one that can be used to change the data of the triple store. You should not use this
configuration if you let Fuseki connect to the internet directly as it would make it possible for anyone to delete, change or add information to the triple
store. The option --localhost tells Fuseki to listen only on the loopback device. You can use Apache mod_proxy or any other web or proxy server to make
the read-only SPARQL endpoint accessible from the internet. With the configuration described, Fueski listens to the port 3030 using HTTP. Using the
address http://localhost:3030/ you can connect to the Fuseki Web UI. http://localhost:3030/dspace/data addresses a writeable SPARQL 1.1 HTTP Graph
Store Protocol endpoint, and http://localhost:3030/dspace/get a read-only one. Under http://localhost:3030/dspace/sparql a read-only SPARQL 1.1 Query
Language endpoint can be found. The first one of these endpoints must be not accessible by the internet, while the last one should be accessible
publicly.
155
First, you'll want to ensure the Linked Data endpoint is enabled/configured. In your local.cfg, add rdf.enabled = true . You can optionally
change it's path by setting rdf.path (it defaults to "rdf" which means the Linked Data endpoint is available at [dspace.server.url]/rdf/ (where ds
pace.server.url is also specified in your local.cfg)
In the file [dspace]/config/dspace.cfg you should look for the property event.dispatcher.default.consumers and add rdf there. Adding
rdf there makes DSpace update the triple store automatically as the publicly available content of the repository changes.
As the Linked Data support of DSpace is highly configurable this section gives a short list of things you probably want to configure before using it. Below
you can find more information on what is possible to configure.
In the file [dspace]/config/modules/rdf.cfg you want to configure the address of the public sparql endpoint and the address of the writable
endpoint DSpace use to connect to the triple store (the properties rdf.public.sparql.endpoint, rdf.storage.graphstore.endpoint). In the
same file you want to configure the URL that addresses the dspace-rdf module which is depending on where you deployed it (property rdf.contextPath
) and switch content negotiation on (set property rdf.contentNegotiation.enable = true).
In the file [dspace]/config/modules/rdf/constant-data-general.ttl you should change the links to the Web UI of the repository and the
public readable SPARQL endpoint. The URL of the public SPARQL endpoint should point to a URL that is proxied by a webserver to the Triple Store. See
the section Install a Triple Store above for further information.
In the file [dspace]/config/modules/rdf/constant-data-site.ttl you may add any triples that should be added to the description of the
repository itself.
If you want to change the way the metadata fields are converted, take a look into the file [dspace]/config/modules/rdf/metadata-rdf-mapping.
ttl. This is also the place to add information on how to map metadata fields that you added to DSpace. There is already a quite acceptable default
configuration for the metadata fields which DSpace supports out of the box. If you want to use some specific prefixes in RDF serializations that support
prefixes, you have to edit [dspace]onfig/modules/rdf/metadata-prefixes.ttl.
Configuration Reference
There are several configuration files to configure DSpace's LOD support. The main configuration file can be found under [dspace-source]/dspace
/config/modules/rdf.cfg. Within DSpace we use Spring to define which classes to load. For DSpace's LOD support this is done within [dspace-
source]/dspace/config/spring/api/rdf.xml. All other configuration files are positioned in the directory [dspace-source]/dspace/config
/modules/rdf/. Configurations in rdf.cfg can be modified directly, or overridden via your local.cfg config file (see Configuration Reference). You'll
have to configure where to find and how to connect to the triple store. You may configure how to generate URIs to be used within the generated Linked
Data and how to convert the contents stored in DSpace into RDF. We will guide you through the configuration file by file.
[dspace-source]/dspace/config/modules/rdf.cfg
Pr rdf.enabled
op
ert
y:
Ex rdf.enabled = true
am
ple
Val
ue:
Inf Defines whether the RDF endpoint is enabled or disabled (disabled by default). If enabled, the RDF endpoint is available at ${dspace.server.url}
or /${rdf.path}. Changing this value requires rebooting your servlet container (e.g. Tomcat)
ma
tio
nal
No
te:
Pr rdf.path
op
ert
y:
Ex rdf.path = rdf
am
ple
Val
ue:
Inf Defines the path of the RDF endpoint, if enabled. For example, a value of "rdf" (the default) means the RDF interface/endpoint is available at
or ${dspace.server.url}/rdf (e.g. if "dspace.server.url = http://localhost:8080/server", then it'd be available at "http://localhost:8080/server
ma /rdf". Changing this value requires rebooting your servlet container (e.g. Tomcat)
tio
nal
No
te:
156
Pr rdf.contentNegotiation.enable
op
ert
y:
Ex rdf.contentNegotiation.enable = true
am
ple
Val
ue:
Inf Defines whether content negotiation should be activated. Set this true, if you use Linked Data support.
or
ma
tio
nal
No
te:
Pr rdf.contextPath
op
ert
y:
Ex rdf.contextPath = ${dspace.baseUrl}/rdf
am
ple
Val
ue:
Inf The content negotiation needs to know where to refer if anyone asks for RDF serializations of content stored within DSpace. This property sets the
or URL where the dspace-rdf module can be reached on the Internet (depending on how you deployed it).
ma
tio
nal
No
te:
Pr rdf.public.sparql.endpoint
op
ert
y:
Ex rdf.public.sparql.endpoint = http://${dspace.baseUrl}/sparql
am
ple
Val
ue:
Inf Address of the read-only public SPARQL endpoint supporting SPARQL 1.1 Query Language.
or
ma
tio
nal
No
te:
Pr rdf.storage.graphstore.endpoint
op
ert
y:
Ex rdf.storage.graphstore.endpoint = http://localhost:3030/dspace/data
am
ple
Val
ue:
Inf Address of a writable SPARQL 1.1 Graph Store HTTP Protocol endpoint. This address is used to create, update and delete converted data in the
or triple store. If you use Fuseki with the configuration provided as part of DSpace 5, you can leave this as it is. If you use another Triple Store or
ma configure Fuseki on your own, change this property to point to a writeable SPARQL endpoint supporting the SPARQL 1.1 Graph Store HTTP
tio Protocol.
nal
No
te:
157
Pr rdf.storage.graphstore.authentication
op
ert
y:
Ex rdf.storage.graphstore.authentication = no
am
ple
Val
ue:
Inf Defines whether to use HTTP Basic authentication to connect to the writable SPARQL 1.1 Graph Store HTTP Protocol endpoint.
or
ma
tio
nal
No
te:
Pr rdf.storage.graphstore.login
op rdf.storage.graphstore.password
erti
es:
Ex rdf.storage.graphstore.login = dspace
am rdf.storage.graphstore.password =ecapsd
ple
Val
ue
s:
Inf Credentials for the HTTP Basic authentication if it is necessary to connect to the writable SPARQL 1.1 Graph Store HTTP Protocol endpoint.
or
ma
tio
nal
No
te:
Pr rdf.storage.sparql.endpoint
op
ert
y:
Ex rdf.storage.sparql.endpoint = http://localhost:3030/dspace/sparql
am
ple
Val
ue:
Inf Besides a writable SPARQL 1.1 Graph Store HTTP Protocol endpoint, DSpace needs a SPARQL 1.1 Query Language endpoint, which can be
or read-only. This property allows you to set an address to be used to connect to such a SPARQL endpoint. If you leave this property empty the
ma property ${rdf.public.sparql.endpoint} will be used instead.
tio
nal
No
te:
Pr rdf.storage.sparql.authentication
op rdf.storage.sparql.login
erti rdf.storage.sparql.password
es:
Ex rdf.storage.sparql.authentication = yes
am rdf.storage.sparql.login = dspace
ple rdf.storage.sparql.password = ecapsd
Val
ue
s:
Inf As for the SPARQL 1.1 Graph Store HTTP Protocol you can configure DSpace to use HTTP Basic authentication to authenticate against the (read-
or only) SPARQL 1.1 Query Language endpoint.
ma
tio
nal
No
te:
158
Pr rdf.converter.DSOtypes
op
ert
y:
Inf Define which kind of DSpaceObjects should be converted. Bundles and Bitstreams will be converted as part of the Item they belong to. Don't add
or EPersons here unless you really know what you are doing. All converted data is stored in the triple store that provides a publicly readable
ma SPARQL endpoint. So all data converted into RDF is exposed publicly. Every DSO type you add here must have an HTTP URI to be referenced in
tio the generated RDF, which is another reason not to add EPersons here currently.
nal
No
te:
Pr rdf.constant.data.GENERAL
op rdf.constant.data.COLLECTION
erti rdf.constant.data.COMMUNITY
es: rdf.constant.data.ITEM
rdf.constant.data.SITE
Ex rdf.constant.data.GENERAL = ${dspace.dir}/config/modules/rdf/constant-data-general.ttl
am rdf.constant.data.COLLECTION = ${dspace.dir}/config/modules/rdf/constant-data-collection.ttl
ple rdf.constant.data.COMMUNITY = ${dspace.dir}/config/modules/rdf/constant-data-community.ttl
Val rdf.constant.data.ITEM = ${dspace.dir}/config/modules/rdf/constant-data-item.ttl
ue rdf.constant.data.SITE = ${dspace.dir}/config/modules/rdf/constant-data-site.ttl
s:
Inf These properties define files to read static data from. These data should be in RDF, and by default Turtle is used as serialization. The data in the
or file referenced by the property ${rdf.constant.data.GENERAL} will be included in every Entity that is converted to RDF. E.g. it can be used to point
ma to the address of the public readable SPARQL endpoint or may contain the name of the institution running DSpace.
tio
nal The other properties define files that will be included if a DSpace Object of the specified type (collection, community, item or site) is converted.
No This makes it possible to add static content to every Item, every Collection, ...
te:
Pr rdf.metadata.mappings
op
ert
y:
Ex rdf.metadata.mappings = ${dspace.dir}/config/modules/rdf/metadata-rdf-mapping.ttl
am
ple
Val
ue:
Inf Defines the file that contains the mappings for the MetadataConverterPlugin. See below the description of the configuration file [dspace-source]
or /dspace/config/modules/rdf/metadata-rdf-mapping.ttl.
ma
tio
nal
No
te:
Pr rdf.metadata.schema
op
ert
y:
Ex rdf.metadata.schema = file://${dspace.dir}/config/modules/rdf/metadata-rdf-schema.ttl
am
ple
Val
ue:
159
Inf Configures the URL used to load the RDF Schema of the DSpace Metadata RDF mapping Vocabulary. Using a file:// URI makes it possible to
or convert DSpace content without having an internet connection. The version of the schema has to be the right one for the used code. In DSpace
ma 5.0 we use the version 0.2.0. This Schema can be found here as well: http://digital-repositories.org/ontologies/dspace-metadata-mapping/0.2.0. Th
tio e newest version of the Schema can be found here: http://digital-repositories.org/ontologies/dspace-metadata-mapping/.
nal
No
te:
Pr rdf.metadata.prefixes
op
ert
y:
Ex rdf.metadata.prefixes = ${dspace.dir}/config/modules/rdf/metadata-prefixes.ttl
am
ple
Val
ue:
Inf If you want to use prefixes in RDF serializations that support prefixes, you can define these prefixes in the file referenced by this property.
or
ma
tio
nal
No
te:
Pr rdf.simplerelations.prefixes
op
ert
y:
Ex rdf.simplerelations.prefixes = ${dspace.dir}/config/modules/rdf/simple-relations-prefixes.ttl
am
ple
Val
ue:
Inf If you want to use prefixes in RDF serializations that support prefixes, you can define these prefixes in the file referenced by this property.
or
ma
tio
nal
No
te:
Pr rdf.simplerelations.site2community
op
ert
y:
Inf Defines the predicates used to link from the data representing the whole repository to the top level communities. Defining multiple predicates
or separated by commas will result in multiple triples.
ma
tio
nal
No
te:
Pr rdf.simplerelations.community2site
op
ert
y:
160
Inf Defines the predicates used to link from the top level communities to the data representing the whole repository. Defining multiple predicates
or separated by commas will result in multiple triples.
ma
tio
nal
No
te:
Pr rdf.simplerelations.community2subcommunity
op
ert
y:
Inf Defines the predicates used to link from communities to their subcommunities. Defining multiple predicates separated by commas will result in
or multiple triples.
ma
tio
nal
No
te:
Pr rdf.simplerelations.subcommunity2community
op
ert
y:
Inf Defines the predicates used to link from subcommunities to the communities they belong to. Defining multiple predicates separated by commas
or will result in multiple triples.
ma
tio
nal
No
te:
Pr rdf.simplerelations.community2collection
op
ert
y:
Inf Defines the predicates used to link from communities to their collections. Defining multiple predicates separated by commas will result in multiple
or triples.
ma
tio
nal
No
te:
Pr rdf.simplerelations.collection2community
op
ert
y:
161
Inf Defines the predicates used to link from collections to the communities they belong to. Defining multiple predicates separated by commas will
or result in multiple triples.
ma
tio
nal
No
te:
Pr rdf.simplerelations.collection2item
op
ert
y:
Inf Defines the predicates used to link from collections to their items. Defining multiple predicates separated by commas will result in multiple triples.
or
ma
tio
nal
No
te:
Pr rdf.simplerelations.item2collection
op
ert
y:
Inf Defines the predicates used to link from items to the collections they belong to. Defining multiple predicates separated by commas will result in
or multiple triples.
ma
tio
nal
No
te:
Pr rdf.simplerelations.item2bitstream
op
ert
y:
Inf Defines the predicates used to link from item to their bitstreams. Defining multiple predicates separated by commas will result in multiple triples.
or
ma
tio
nal
No
te:
[dspace-source]/dspace/config/modules/rdf/constant-data-*.ttl
As described in the documentation of the configuration file [dspace-source]/dspace/config/modules/rdf.cfg, the constant-data-*.ttl files can be used to add
static RDF to the converted data. The data are written in Turtle, but if you change the file suffix (and the path to find the files in rdf.cfg) you can use any
other RDF serialization you like to. You can use this, for example, to add a link to the public readable SPARQL endpoint, add a link to the repository
homepage, or add a triple to every community or collection defining it as an entity of a specific type like a bibo:collection. The content of the file [dspace-
source]/dspace/config/modules/rdf/constant-data-general.ttl will be added to every DSpaceObject that is converted. The content of the file [dspace-source]
/dspace/config/modules/rdf/constant-data-community.ttl to every community, the content of the file [dspace-source]/dspace/config/modules/rdf/constant-
data-collection.ttl to every collection and the content of the file [dspace-source]/dspace/config/modules/rdf/constant-data-item.ttl to every Item. You can use
the file [dspace-source]/dspace/config/modules/rdf/constant-data-site.ttl to specify data representing the whole repository.
[dspace-source]/dspace/config/modules/rdf/metadata-rdf-mapping.ttl
162
This file should contain several metadata mappings. A metadata mapping defines how to map a specific metadata field within DSpace to a triple that will
be added to the converted data. The MetadataConverterPlugin uses these metadata mappings to convert the metadata of a item into RDF. For every
metadata field and value it looks if any of the specified mappings matches. If one does, the plugin creates the specified triple and adds it to the converted
data. In the file you'll find a lot of examples on how to define such a mapping.
For every mapping a metadata field name has to be specified, e.g. dc.title, dc.identifier.uri. In addition you can specify a condition that is matched against
the field's value. The condition is specified as a regular expression (using the syntax of the java class java.util.regex.Pattern). If a condition is defined, the
mapping will be used only on fields those values which are matched by the regex defined as condition.
The triple to create by a mapping is specified using reified RDF statements. The DSpace Metadata RDF Mapping Vocabulary defines some placeholders
that can be used. The most important placeholder is dm:DSpaceObjectIRI which is replaced by the URI used to identify the entity being converted to RDF.
That means if a specific Item is converted the URI used to address this Item in RDF will be used instead of dm:DSpaceObjectIRI. There are three
placeholders that allow reuse of the value of a meta data field. dm:DSpaceValue will be replace by the value as it is. dm:LiteralGenerator allows one to
specify a regex and replacement string for it (see the syntax of the java classes java.util.regex.Pattern and java.util.regex.Matcher) and creates a Literal
out of the field value using the regex and the replacement string. dm:ResourceGenerator does the same as dm:LiteralGenerator but it generates a HTTP
(S) URI that is used in place. So you can use the resource generator to generate URIs containing modified field values (e.g. to link to classifications). If you
know regular expressions and turtle, the syntax should be quite self explanatory.
[dspace-source]/dspace/config/modules/rdf/fuseki-assembler.ttl
This is a configuration for the triple store Fuseki of the Apache Jena project. You can find more information on the configuration it provides in the section Ins
tall a Triple Store above.
[dspace-source]/dspace/config/spring/api/rdf.xml
This file defines which classes are loaded by DSpace to provide the RDF functionality. There are two things you might want to change: the class that is
responsible to generate the URIs to be used within the converted data, and the list of Plugins used during conversion. To change the class responsible for
the URIs, change the following line:
This line defines how URIs should be generated, to be used within the converted data. The LocalURIGenerator generates URIs using the ${dspace.url}
property. The HandleURIGenerator uses handles in form of HTTP URLs. It uses the property ${handle.canonical.prefix} to convert handles into HTTPS
URLs. The class org.dspace.rdf.storage.DOIURIGenerator uses DOIs in the form of HTTP URLs if possible, or local URIs if there are no DOIs. It uses the
DOI resolver "http://dx.doi.org" to convert DOIs into HTTP URLs. The class org.dspace.rdf.storage.DOIHandleGenerator does the same but uses Handles
as fallback if no DOI exists. The fallbacks are necessary as DOIs are currently used for Items only and not for Communities or Collections.
All plugins that are instantiated within the configuration file will automatically be used during the conversion. Per default the list looks like the following:
<!-- configure all plugins the converter should use. If you don't want to
use a plugin, remove it here. -->
<bean id="org.dspace.rdf.conversion.SimpleDSORelationsConverterPlugin" class="org.dspace.rdf.conversion.
SimpleDSORelationsConverterPlugin"/>
<bean id="org.dspace.rdf.conversion.MetadataConverterPlugin" class="org.dspace.rdf.conversion.
MetadataConverterPlugin"/>
<bean id="org.dspace.rdf.conversion.StaticDSOConverterPlugin" class="org.dspace.rdf.conversion.
StaticDSOConverterPlugin"/>
You can remove plugins if you don't want them. If you develop a new conversion plugin, you want to add its class to this list.
Maintenance
As described above you should add rdf to the property event.dispatcher.default.consumers and in dspace.cfg. This configures DSpace to
automatically update the triple store every time the publicly available content of the repository is changed. Nevertheless there is a command line tool that
gives you the possibility to update the content of the triple store. As the triple store is used as a cache only, you can delete its content and reindex it every
time you think it is necessary of helpful. The command line tool can be started by the following command which will show its online help:
The online help should give you all necessary information. There are commands to delete one specific entity; to delete all information stored in the triple
store; to convert one item, one collection or community (including all subcommunities, collections and items) or to convert the complete content of your
repository. If you start using the Linked Open Data support on a repository that already contains content, you should run [dspace-install]/bin
/dspace rdfizer --convert-all once.
Every time content of DSpace is converted or Linked Data is requested, DSpace will try to connect to the triple store. So ensure that it is running (as you
do with e.g. your sevlet container or relational database).
163
SWORDv1 Client
The embedded SWORD Client allows a user (currently restricted to an administrator) to copy an item to a SWORD server. This allows your DSpace
installation to deposit items into another SWORD-compliant repository (including another DSpace install).
The SWORDv1 Client is not available in DSpace 7.0. It may be restored in a later 7.x release, see DSpace Release 7.0 Status
Property: sword-client.targets
Example value:
sword-client.targets = http://localhost:8080/sword/servicedocument, \
http://client.swordapp.org/client/servicedocument, \
http://dspace.swordapp.org/sword/servicedocument, \
http://sword.eprints.org/sword-app/servicedocument, \
http://sword.intralibrary.com/IntraLibrary-Deposit/service, \
http://fedora.swordapp.org/sword-fedora/servicedocument
Informational note: List of remote Sword servers. Used to build the drop-down list of selectable SWORD targets.
Property: sword-client.file-types
Informational note: List of file types from which the user can select. If a type is not supported by the remote server
it will not appear in the drop-down list.
Property: sword-client.package-formats
Example value:
sword-client.package-formats = http://purl.org/net/sword-types/METSDSpaceSIP
Informational note: List of package formats from which the user can select. If a format is not supported by the remote server
it will not appear in the drop-down list.
164
Exchanging Content Between Repositories
1 Transferring Content via Export and Import
1.1 Transferring Communities, Collections, or Items using Packages
2 Transferring Items using Simple Archive Format
3 Transferring Items using OAI-ORE/OAI-PMH Harvester
First, you should export the DSpace Item(s) into the Simple Archive Format, as detailed at: Importing and Exporting Items via Simple Archive Format (SAF).
Be sure to use the --migrate option, which removes fields that would be duplicated on import. Then import the resulting files into the other instance.
165
OAI
OAI Interfaces
1 OAI-PMH Server
1.1 OAI-PMH Server Activation
1.2 OAI-PMH Server Maintenance
2 OAI-PMH / OAI-ORE Harvester (Client)
2.1 Harvesting from another DSpace
2.2 OAI-PMH / OAI-ORE Harvester Configuration
2.3 Setting up a harvest to import content into a collection
2.3.1 Using the "harvest" script
2.3.1.1 Examples of harvesting a collection through CLI commands
2.3.2 Setting up a harvest content source from the UI
3 DSpace 7 Demo - OAI-PMH
OAI-PMH Server
In the following sections and subpages, you will learn how to configure OAI-PMH server and activate additional OAI-PMH crosswalks. The user is also
referred to OAI-PMH Data Provider for greater depth details of the program.
The OAI-PMH Interface may be used by other systems to harvest metadata records from your DSpace.
If you modify either of these configuration, you must restart your Servlet Container (usually Tomcat).
You can test that it is working by sending a request to: [dspace.server.url]/[oai.path]/request?verb=Identify (e.g.
http://localhost:8080/server/oai/request?verb=Identify)
The response should look similar to the response from the DSpace 7 Demo Server: https://api7.dspace.org/server/oai/request?verb=Identify
If you're using a recent browser, you should see a HTML page describing your repository. What you're getting from the server is in fact an XML file with a
link to an XSLT stylesheet that renders this HTML in your browser (client-side). Any browser that cannot interpret XSLT will display pure XML. The default
stylesheet is located in [dspace-source]/dspace-oai/src/main/resources/static/style.xsl and can be changed by configuring the style
sheet attribute of the Configuration element in [dspace]/config/crosswalks/oai/xoai.xml.
Relevant Links
OAI 2.0 Server - basic information needed to configure and use the OAI Server in DSpace
OAI-PMH Data Provider 2.0 (Internals) - information on how it's implemented
http://www.openarchives.org/pmh/ - information on the OAI-PMH protocol and its usage (not DSpace-specific)
Here's an example cron that can be used to schedule an OAI-PMH reindex on a nightly basis (for a full list of recommended DSpace cron tasks see Schedu
led Tasks via Cron):
# Update the OAI-PMH index with the newest content at midnight every day
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING OAI-PMH
# (This ensures new content is available via OAI-PMH)
0 0 * * * [dspace.dir]/bin/dspace oai import > /dev/null
More information about the dspace oai commandline tool can be found in the OAI Manager documentation.
166
OAI-PMH / OAI-ORE Harvester (Client)
This section describes the parameters used in configuring the OAI-ORE / OAI-ORE harvester. This harvester can be used to harvest content (bitstreams
and metadata) into DSpace from an external OAI-PMH or OAI-ORE server.
OAI Harvesting was not available in DSpace 7.0. It was restored in DSpace 7.1. See DSpace Release 7.0 Status
If the external DSpace is running v6.x or below, it must be running both the OAI-PMH interface and the XMLUI interface to support harvesting content from
it via OAI-ORE.
If the external DSpace is running v7.x or above, it just needs to be running the OAI-PMH interface.
You can verify that OAI-ORE harvesting option is enabled by following these steps:
1. First, check to see if the external DSpace reports that it will support harvesting ORE via the OAI-PMH interface. Send the following request to the
DSpace's OAI-PMH interface: http://[full-URL-to-OAI-PMH]/request?verb=ListRecords&metadataPrefix=ore
The response should be an XML document containing ORE, similar to the response from the DSpace Demo Server: http://demo.dspace.
org/oai/request?verb=ListRecords&metadataPrefix=ore
2. For 6.x or below, you can verify that the XMLUI interface supports OAI-ORE (it should, as long as it's a current version of DSpace). First, find a
valid Item Handle. Then, send the following request to the DSpace's XMLUI interface: http://[full-URL-to-XMLUI]/metadata/handle/
[item-handle]/ore.xml
The response should be an OAI-ORE (XML) document which describes that specific Item. It should look similar to the response from the
DSpace Demo Server: http://demo.dspace.org/xmlui/metadata/handle/10673/3/ore.xml
For examples of how to set up harvesting via the User Interface, see the "Content Source" settings of the "Edit Collection" documentation
Configuration [dspace]/config/modules/oai.cfg
File:
Property: oai.harvester.eperson
Informational The EPerson under whose authorization automatic harvesting will be performed. This field does not have a default value and must
Note: be specified in order to use the harvest scheduling system. This will most likely be the DSpace admin account created during
installation.
Property: oai.url
Informational The base url of the OAI-PMH disseminator webapp (i.e. do not include the /request on the end). This is necessary in order to mint
Note: URIs for ORE Resource Maps. The default value of ${dspace.baseUrl}/oai will work for a typical installation, but should be
changed if appropriate. Please note that dspace.baseUrl is defined in your dspace.cfg configuration file.
Property: oai.ore.authoritative.source
Informational The webapp responsible for minting the URIs for ORE Resource Maps. If using oai, the oai.url config value must be set.
Note:
When set to 'oai', all URIs in ORE Resource Maps will be relative to the OAI-PMH URL (configured by oai.url above)
The URIs generated for ORE ReMs follow the following convention for either setting: http://\[base-URL\]/metadata/handle/\[item-
handle\]/ore.xml
Property: oai.harvester.autoStart
Informational Determines whether the harvest scheduler process starts up automatically when DSpace webapp is redeployed.
Note:
167
Property: oai.harvester.metadataformats.PluginName
Example Value:
oai.harvester.metadataformats.PluginName = \
http://www.openarchives.org/OAI/2.0/oai_dc/, Simple Dublin Core
Informational This field can be repeated and serves as a link between the metadata formats supported by the local repository and those
Note: supported by the remote OAI-PMH provider. It follows the form oai.harvester.metadataformats.PluginName =
NamespaceURI,Optional Display Name . The pluginName designates the metadata schemas that the harvester "knows" the
local DSpace repository can support. Consequently, the PluginName must correspond to a previously declared ingestion crosswalk.
The namespace value is used during negotiation with the remote OAI-PMH provider, matching it against a list returned by the
ListMetadataFormats request, and resolving it to whatever metadataPrefix the remote provider has assigned to that namespace.
Finally, the optional display name is the string that will be displayed to the user when setting up a collection for harvesting. If
omitted, the PluginName:NamespaceURI combo will be displayed instead.
Property: oai.harvester.oreSerializationFormat.OREPrefix
Example Value:
oai.harvester.oreSerializationFormat.OREPrefix = \
http://www.w3.org/2005/Atom
Informational This field works in much the same way as oai.harvester.metadataformats.PluginName . The OREPrefix must
Note: correspond to a declared ingestion crosswalk, while the Namespace must be supported by the target OAI-PMH provider when
harvesting content.
Property: oai.harvester.timePadding
Informational Amount of time subtracted from the from argument of the PMH request to account for the time taken to negotiate a connection.
Note: Measured in seconds. Default value is 120.
Property: oai.harvester.harvestFrequency
Informational How frequently the harvest scheduler checks the remote provider for updates. Should always be longer than timePadding .
Note: Measured in minutes. Default value is 720.
Property: oai.harvester.minHeartbeat
Informational The heartbeat is the frequency at which the harvest scheduler queries the local database to determine if any collections are due for
Note: a harvest cycle (based on the harvestFrequency) value. The scheduler is optimized to then sleep until the next collection is actually
ready to be harvested. The minHeartbeat and maxHeartbeat are the lower and upper bounds on this timeframe. Measured in
seconds. Default value is 30.
Property: oai.harvester.maxHeartbeat
Informational The heartbeat is the frequency at which the harvest scheduler queries the local database to determine if any collections are due for
Note: a harvest cycle (based on the harvestFrequency) value. The scheduler is optimized to then sleep until the next collection is actually
ready to be harvested. The minHeartbeat and maxHeartbeat are the lower and upper bounds on this timeframe. Measured in
seconds. Default value is 3600 (1 hour).
Property: oai.harvester.maxThreads
Informational How many harvest process threads the scheduler can spool up at once. Default value is 3.
Note:
Property: oai.harvester.threadTimeout
Informational How much time passes before a harvest thread is terminated. The termination process waits for the current item to complete ingest
Note: and saves progress made up to that point. Measured in hours. Default value is 24.
Property: oai.harvester.unknownField
168
Example Value: oai.harvester.unkownField = fail | add | ignore
Informational You have three (3) choices. When a harvest process completes for a single item and it has been passed through ingestion
Note: crosswalks for ORE and its chosen descriptive metadata format, it might end up with DIM values that have not been defined in the
local repository. This setting determines what should be done in the case where those DIM values belong to an already declared
schema. Fail will terminate the harvesting task and generate an error. Ignore will quietly omit the unknown fields. Add will add the
missing field to the local repository's metadata registry. Default value: fail.
Property: oai.harvester.unknownSchema
Informational When a harvest process completes for a single item and it has been passed through ingestion crosswalks for ORE and its chosen
Note: descriptive metadata format, it might end up with DIM values that have not been defined in the local repository. This setting
determines what should be done in the case where those DIM values belong to an unknown schema. Fail will terminate the
harvesting task and generate an error. Ignore will quietly omit the unknown fields. Add will add the missing schema to the local
repository's metadata registry, using the schema name as the prefix and "unknown" as the namespace. Default value: fail.
Property: oai.harvester.acceptedHandleServer
Example Value:
oai.harvester.acceptedHandleServer = \
hdl.handle.net, handle.test.edu
Informational A harvest process will attempt to scan the metadata of the incoming items (identifier.uri field, to be exact) to see if it looks like a
Note: handle. If so, it matches the pattern against the values of this parameter. If there is a match the new item is assigned the handle
from the metadata value instead of minting a new one. Default value: hdl.handle.net .
Property: oai.harvester.rejectedHandlePrefix
Informational Pattern to reject as an invalid handle prefix (known test string, for example) when attempting to find the handle of harvested items. If
Note: there is a match with this config parameter, a new handle will be minted instead. Default value: 123456789 .
-p --purge [none] Delete all the items in the collection provided with the -c parameter.
-r --run [none] Run the standard harvesting procedure for the collection provided with the -c parameter.
-g --ping [none] Verify that the server provided through the -a parameter and the set provided through the -i parameter can be resolved and
work.
-s --setup [none] Set the collection provided with the -c parameter up for harvesting. The server will need to be provided through the -a
parameter, and the oai set id needs to be provided by the -i parameter.
-o --reimport [none] Reimport all items the items in the collection provided by the -c parameter. This is the equivalent of running both the -p and
the -r command for the provided collection.
-t --type [type-code] The type of harvesting: 0 for no harvesting, 1 for metadata only, 2 for metadata and bitstream references (requires ORE
support), 3 for metadata and bitstreams (requires ORE support)
-i --oai_set_id [set-id] The id of the PMH set representing the harvested collection. In case all sets need to harvested the value "all" should be
provided.
169
-m -- [format] The name of the desired metadata format for harvesting, resolved to namespace and crosswalk in the dspace.cfg
metadata_f
ormat
-e --eperson [email] (CLI ONLY) The eperson that performs the harvest. When the command is used from the REST API, the currently logged in
user will be used.
Replace https://harvest.source.org with the source you want to use, the harvest-set with the set/sets you want to harvest or all in case you
want to harvest all sets.
Replace the 123456789/123 with your collection, the [email protected] with an existing user in DSpace that has sufficient rights to perform
the ingestion, https://harvest.source.org with the source you want to use, the harvest-set with the set/sets you want to harves or all in case
you want to harvest all sets. The -m parameter indicated the metadata format to be used and the -t parameter indicates the harvest type to be used.
When the value 0 is used for -t , harvesting will be disabled.
Replace the 123456789/123 with your collection, the [email protected] with an existing user in DSpace that has sufficient rights to perform
the ingestion.
Navigate to the "Edit collection" > "Content Source" tab. Tick the checkbox "This collection harvests its content from an external source".
Once the checkbox has been ticket, the OAI provider, set id and metadata format can be configured. An example of the configuration can be found in the
image below.
170
When all sets need to be harvested, the field can be left empty.
The server configuration will be tested upon clicking the "Save" button.
Click the "Import Now" button to start the import. When the import has started, the button will indicate that the import is in progress, however, there is no
need to remain on this page as the harvest will continue to run after leaving this page.
If the current server configuration needs to be retested at a later point, the "Test configuration" button can be used. To fully reset the collection by purging
all items and starting a reimport, click the "Reset and reimport" button.
171
OAI-PMH Data Provider 2.0 (Internals)
1 OAI-PMH Data Provider 2.0 (Internals)
1.1 Sets
1.2 Unique Identifier
1.3 Access control
1.4 Modification Date (OAI Date Stamp)
1.5 "About" Information
1.6 Deletions
1.7 Flow Control (Resumption Tokens)
The DSpace build process builds a single backend webapp, which optionally includes an OAI-PMH endpoint (when oai.enabled=true) In a typical
configuration, this endpoint is deployed at ${dspace.server.url}/oai (configured by "oai.path"), containing request, driver and openaire contexts,
for example:
http://dspace.myu.edu/server/oai/request?verb=Identify
http://dspace.myu.edu/server/oai/request
http://dspace.myu.edu/server/oai/driver
http://dspace.myu.edu/server/oai/openaire
Sets
OAI-PMH allows repositories to expose an hierarchy of sets in which records may be placed. A record can be in zero or more sets.
Each community and collection has a corresponding OAI set, discoverable by harvesters via the ListSets verb. The setSpec is based on the community
/collection handle, with the "/" converted to underscore to form a legal setSpec. The setSpec is prefixed by "com_" or "col_" for communities and
collections, respectively (this is a change in set names in DSpace 3.0 / OAI 2.0). For example:
col_1721.1_1234
Naturally enough, the community/collection name is also the name of the corresponding set.
Unique Identifier
Every item in OAI-PMH data repository must have an unique identifier, which must conform to the URI syntax. As of DSpace 1.2, Handles are not used;
this is because in OAI-PMH, the OAI identifier identifies the metadata record associated with the resource. The resource is the DSpace item, whose resour
ce identifier is the Handle. In practical terms, using the Handle for the OAI identifier may cause problems in the future if DSpace instances share items with
the same Handles; the OAI metadata record identifiers should be different as the different DSpace instances would need to be harvested separately and
may have different metadata for the item.
oai:PREFIX:handle
For example:
172
oai:dspace.myu.edu:123456789/345
If you wish to use a different scheme, this can easily be changed by editing the value of identifier.prefix at [dspace]/config/modules/oai.cfg file.
Access control
OAI provides no authentication/authorisation details, although these could be implemented using standard HTTP methods. It is assumed that all access
will be anonymous for the time being.
A question is, "is all metadata public?" Presently the answer to this is yes; all metadata is exposed via OAI-PMH, even if the item has restricted access
policies. The reasoning behind this is that people who do actually have permission to read a restricted item should still be able to use OAI-based services
to discover the content. But, exposed data could be changed by changing the XSLT defined at [dspace]/config/crosswalks/oai/metadataFormats.
"About" Information
As part of each record given out to a harvester, there is an optional, repeatable "about" section which can be filled out in any (XML-schema conformant)
way. Common uses are for provenance and rights information, and there are schemas in use by OAI communities for this. Presently DSpace does not
provide any of this information, but XOAI core library allows its definition. This requires to dive into code and perform some changes.
Deletions
As DSpace supports two forms of deletions (withdrawals or permanent expunging), this has an impact on how OAI-PMH exposes delitions. During a
permanent deletion (expunge), DSpace no longer retains any information about the deleted object. Therefore, permanent deletions "disappear" from OAI-
PMH, as DSpace no longer has any information about the object. This is considered a "transient" approach to deletion based on OAI-PMH definitions.
When an item is withdrawn in DSpace, the item still exists but it hidden from public view. Withdrawn items will report a "<header status="deleted">" in OAI-
PMH when a GetRecord request is made for a withdrawn item (however, they are NOT shown in an OAI-PMH "ListRecords" request by default). Keep in
mind that the OAI-PMH index does NOT update automatically, so withdrawn items will not show this "deleted" status until "./dspace oai import" is next run.
Once an item has been withdrawn, OAI-PMH harvests of the date range in which the withdrawal occurred will find the "deleted" record header. Harvests of
a date range prior to the withdrawal will not find the record, despite the fact that the record did exist at that time. As an example of this, consider an item
that was created on 2002-05-02 and withdrawn on 2002-10-06. A request to harvest the month 2002-10 will yield the "record deleted" header. However, a
harvest of the month 2002-05 will not yield the original record.
DSpace supports resumption tokens for "ListRecords", "ListIdentifiers" and "ListSets" OAI-PMH requests.
Each OAI-PMH ListRecords request will return at most 100 records (by default) but it could be configured in the [dspace]/config/crosswalks/oai
/xoai.xml file.
When a resumption token is issued, the optional completeListSize and cursor attributes are included. OAI 2.0 resumption tokens are persistent, so expiratio
nDate of the resumption token is undefined, they do not expire.
Resumption tokens contain all the state information required to continue a request.
173
OAI 2.0 Server
1 Introduction
1.1 What is OAI 2.0?
1.2 Why OAI 2.0?
1.3 Concepts (XOAI Core Library)
2 OAI 2.0
2.1 Indexing OAI content
2.1.1 OAI Manager
2.1.2 Scheduled Tasks
2.2 Client-side stylesheet
2.3 Metadata Formats
2.4 Encoding problems
3 Configuration
3.1 Basic Configuration
3.2 Advanced Configuration
3.2.1 General options
3.2.2 Add/Remove Metadata Formats
3.2.3 Add/Remove Metadata Fields
4 Driver/OpenAIRE compliance
4.1 Driver Compliance
4.2 OpenAIRE compliance
5 Sanity check your OAI interface with the OAI Validator
Introduction
Open Archives Initiative Protocol for Metadata Harvesting is a low-barrier mechanism for repository interoperability. Data Providers are repositories that
expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six
verbs or services that are invoked within HTTP.
http://www.example.com/oai/<context>
Contexts could be seen as virtual distinct OAI interfaces, so with this one could have things like:
http://www.example.com/oai/request
http://www.example.com/oai/driver
http://www.example.com/oai/openaire
With this ingredients it is possible to build a robust solution that fulfills all requirements of Driver, OpenAIRE and also other project-specific requirements.
As shown in Figure 1, with contexts one could select a subset of all available items in the data source. So when entering the OpenAIRE context, all OAI-
PMH request will be restricted to that subset of items.
174
At this stage, contexts could be seen as sets (also defined in the basic OAI-PMH protocol). The magic of XOAI happens when one need specific metadata
format to be shown in each context. Metadata requirements by Driver slightly differs from the OpenAIRE ones. So for each context one must define its
specific transformer. So, contexts could be seen as an extension to the concept of sets.
To implement an OAI interface from the XOAI core library, one just need to implement the datasource interface.
OAI 2.0
OAI 2.0 is deployed as a part of the DSpace server (backend) webapp. OAI 2.0 has a configurable data source, by default it will not query the DSpace
SQL database at the time of the OAI-PMH request. Instead, it keeps the required metadata in its Solr index (currently in a separate "oai" Solr core) and
serves it from there. It's also possible to set OAI 2.0 to only use the database for querying purposes if necessary, but this decreases performance
significantly. Furthermore, it caches the requests, so doing the same query repeatedly is very fast. In addition to that it also compiles DSpace items to
make uncached responses much faster.
The OAI 2.0 Server only uses Solr for its indexing. The previous capability to use Database indexing has been removed.
The Solr index can be updated at your convenience, depending on how fresh you need the information to be. Typically, the administrator sets up a nightly
cron job to update the Solr index from the SQL database.
OAI Manager
OAI manager is a utility that allows one to do certain administrative operations with OAI. You can call it from the command line using the dspace launcher:
Syntax
Actions
import Imports DSpace items into OAI Solr index (also cleans OAI cache)
clean-cache Cleans the OAI cache
Parameters
-c Clears the Solr index before indexing (it will import all items again)
-v Verbose output
-h Shows an help text
Scheduled Tasks
In order to refresh the OAI Solr index, it is required to run the [dspace]/bin/dspace oai import command periodically. You can add the following
task to your crontab:
Note that [dspace] should be replaced by the correct value, that is, the value defined in dspace.cfg parameter dspace.dir.
Client-side stylesheet
175
The OAI-PMH response is an XML file. While OAI-PMH is primarily used by harvesting tools and usually not directly by humans, sometimes it can be
useful to look at the OAI-PMH requests directly - usually when setting it up for the first time or to verify any changes you make. For these cases, XOAI
provides an XSLT stylesheet to transform the response XML to a nice looking, human-readable and interactive HTML. The stylesheet is linked from the
XML response and the transformation takes place in the user's browser (this requires a recent browser, older browsers will only display the XML directly).
Most automated tools are interested only in the XML file itself and will not perform the transformation. If you want, you can change which stylesheet will be
used by placing it into the [dspace]/webapps/oai/static directory (or into the [dspace-src]/dspace-xoai/dspace-xoai-webapp/src/main
/webapp/static after which you have to rebuild DSpace), modifying the "stylesheet" attribute of the "Configuration" element in [dspace]/config
/crosswalks/oai/xoai.xml and restarting your servlet container.
Metadata Formats
By default OAI 2.0 provides 12 metadata formats within the /request context:
1. OAI_DC
2. DIDL
3. DIM
4. ETDMS
5. METS
6. MODS
7. OAI-ORE
8. QDC
9. RDF
10. MARC
11. UKETD_DC
12. XOAI
1. OAI_DC
2. DIDL
3. METS
1. OAI_DC
2. METS
Encoding problems
There are two main potential sources of encoding problems:
a) The servlet connector port has to use the correct encoding. E.g. for Tomcat, this would be <Connector port="8080" ... URIEncoding="UTF-
8" />, where the port attribute specifies port of the connector that DSpace is configured to access Solr on (this is usually 8080, 80 or in case of AJP
8009).
b) System locale of the dspace command line script that is used to do the oai import. Make sure the user account launching the script (usually from cron)
has the correct locale set (e.g. en_US.UTF-8). Also make sure the locale is actually present on your system.
Configuration
Basic Configuration
Configuration [dspace]/config/modules/oai.cfg
File:
Property: oai.enabled
Property: oai.path
Information Note: Allows you to specify the path where the OAI module will be deployed. This path is relative to the dspace.server.url. So, for
example, if "dspace.server.url=http://localhost:8080/server", then by default the OAI module is available at http://localhost:8080
/server/oai/
Property: oai.storage
Information Note: This allows to choose the OAI data source between solr and database. ONLY "solr" is supported at this time.
176
Property: oai.solr.url
Property: oai.identifier.prefix
Property: oai.config.dir
Informational Configuration directory, used by XOAI (core library). Contains xoai.xml, metadata format XSLTs and transformer XSLTs.
Note:
Property: oai.cache.enabled
Informational Whether to enable the OAI cache. Default is true (for better performance).
Note:
Property: oai.cache.dir
Advanced Configuration
OAI 2.0 allows you to configure following advanced options:
Contexts
Transformers
Metadata Formats
Filters
Sets
General options
These options influence the OAI interface globally. "per page" means per request, next page (if there is one) can be requested using resumptionToken
provided in current page.
identation [boolean] - whether the output XML should be indented to make it human-readable
maxListIdentifiersSize [integer] - how many identifiers to show per page (verb=ListIdentifiers)
maxListRecordsSize [integer] - how many records to show per page (verb=ListRecords)
maxListSetsSize [integer] - how many sets to show per page (verb=ListSets)
stylesheet [relative file path] - an xsl stylesheet used by client's web browser to transform the output XML into human-readable HTML
Their location and default values are shown in the following fragment:
<Configuration xmlns="http://www.lyncode.com/XOAIConfiguration"
identation="false"
maxListIdentifiersSize="100"
maxListRecordsSize="100"
maxListSetsSize="100"
stylesheet="static/style.xsl">
177
<Context baseurl="request">
<Format refid="oaidc" />
<Format refid="mets" />
<Format refid="xoai" />
<Format refid="didl" />
<Format refid="dim" />
<Format refid="ore" />
<Format refid="rdf" />
<Format refid="etdms" />
<Format refid="mods" />
<Format refid="qdc" />
<Format refid="marc" />
<Format refid="uketd_dc" />
</Context>
<Context baseurl="request">
<Format refid="oaidc" />
<Format refid="mets" />
<Format refid="didl" />
<Format refid="dim" />
<Format refid="ore" />
<Format refid="rdf" />
<Format refid="etdms" />
<Format refid="mods" />
<Format refid="qdc" />
<Format refid="marc" />
<Format refid="uketd_dc" />
</Context>
It is also possible to create new metadata format by creating a specific XSLT for it. All already defined XSLT for DSpace can be found in the [dspace]
/config/crosswalks/oai/metadataFormats directory. So after producing a new one, add the following information (location marked using brackets) inside
the <Formats> element in [dspace]/config/crosswalks/oai/xoai.xml:
<Format id="[IDENTIFIER]">
<Prefix>[PREFIX]</Prefix>
<XSLT>metadataFormats/[XSLT]</XSLT>
<Namespace>[NAMESPACE]</Namespace>
<SchemaLocation>[SCHEMA_LOCATION]</SchemaLocation>
</Format>
where:
Parameter Description
IDENTIFIER The identifier used within context configurations to reference this specific format, must be unique within all Metadata Formats
available.
Therefore exposing any DSpace metadata field in any OAI format is just a matter of modifying the corresponding output format stylesheet (This assumes
the general knowledge of how XSLT works. For a tutorial, see e.g. http://www.w3schools.com/xsl/).
178
For example, if you have a DC field "local.note.librarian" that you want to expose in oai_dc as <dc:note> (please note that this is not a valid DC field and
thus breaks compatibility), then edit oai_dc.xsl and add the following lines just above the closing tag </oai_dc:dc>:
<xsl:for-each select="doc:metadata/doc:element[@name='local']/doc:element[@name='note']/doc:element/doc:element
/doc:field[@name='librarian']">
<dc:note><xsl:value-of select="." /></dc:note>
</xsl:for-each>
If you need to add/remove metadata fields, you're changing the output format. Therefore it is recommended to create a new metadata format as a copy of
the one you want to modify. This way the old format will remain available along with the new one and any upgrades to the original format during DSpace
upgrades will not overwrite your customizations. If you need the format to have the same name as the original format (e.g. the default oai_dc format), you
can create a new context in xoai.xsl containing your modified format with the original name, which will be available as /oai/context-name.
NOTE: Please, keep in mind that the OAI provider caches the transformed output, so you have to run [dspace]/bin/dspace oai clean-cache after
any .xsl modification and reload the OAI page for the changes to take effect. When adding/removing metadata formats, making changes in [dspace]/config
/crosswalks/oai/xoai.xml requires reloading/restarting the servlet container.
Driver/OpenAIRE compliance
The default OAI 2.0 installation provides two new contexts. They are:
However, in order to be exposed DSpace items must be compliant with Driver/OpenAIRE guide-lines.
Driver Compliance
DRIVER Guidelines for Repository Managers and Administrators on how to expose digital scientific resources using OAI-PMH and Dublin Core Metadata,
creating interoperability by homogenizing the repository output. The OAI-PMH driver set is based on DRIVER Guidelines 2.0.
This set is used to expose items of the repository that are available for open access. It’s not necessary for all the items of the repository to be available for
open access.
To have items in this set, you must configure your input-forms.xml file in order to comply with the DRIVER Guidelines:
As DRIVER guidelines use Dublin Core, all the needed items are already registered in DSpace. You just need to configure the deposit process.
OpenAIRE compliance
For OpenAIRE v4 compliance, see OpenAIRE4 Guidelines Compliancy
The OpenAIRE Guidelines 2.0 provide the OpenAIRE compatibility to repositories and aggregators. By implementing these Guidelines, repository
managers are facilitating the authors who deposit their publications in the repository in complying with the EC Open Access requirements. For developers
of repository platforms, the Guidelines provide guidance to add supportive functionalities for authors of EC-funded research in future versions.
The name of the set in OAI-PMH is "ec_fundedresources" and will expose the items of the repository that comply with these guidelines. These guidelines
are based on top of DRIVER guidelines. See version 2.0 of the Guidelines.
These are the OpenAIRE metadata values only, to check these and driver metadata values check page 11 of the OpenAIRE guidelines 2.0.
Optionally:
dc:date with the embargo end date (recommended for embargoed items)
179
<dc:date>info:eu-repo/date/embargoEnd/2011-05-12<dc:date>
Have a dc:relation field in input-forms.xml with a list of the projects. You can also use the OpenAIRE Authority Control Addon to facilitate the
process of finding the project.
Just use a combo-box for dc:rights to input the 4 options:
info:eu-repo/semantics/closedAccess
info:eu-repo/semantics/embargoedAccess
info:eu-repo/semantics/restrictedAccess
info:eu-repo/semantics/openAccess
Use an input-box for dc:date to insert the embargo end date
Relevant Links
180
OpenAIRE4 Guidelines Compliancy
Loading of Entities and Fields
OpenAIRE4 features depends on Configurable Entities feature and its default configurations. In order to have your repository compliant with OpenAIRE4
guidelines you need to follow some steps:
The default submission-forms.xml file configures the form fields that allow the creation of the specific OpenAIRE entities and their relationships. In order to
use those forms you need to configure your item-submission.xml and add these to the <submission-map>:
item-submission.xml
Please note that my collection-handle="123456789/4" will be different in your system and it refers to the collection that will gather a specific Entity
type like Publications, Persons, Projects or Organizations.
To load OpenAIRE Entities model you must firstly run the following:
After those steps your repository will have the required fields and entities for the compliancy.
OAI interface
As decided in our Entities meeting (2019-11-19 DSpace 7 Entities WG Meeting), the XOAI Default Context should only display Publications or non Entity
Items. For OpenAIRE4 it will also be considered only Publications as the main Entity to be processed and all the related ones will be loaded in the process.
in order to use it, you must first ensure you have the oai.cfg setting uncommented:
oai.enabled = true
(NOTE: you may need to restart your tomcat service)
If you need to display additional metadata at the oai_openaire metadata format, you could rename the file:
[/dspace/]config/spring/api/virtual-metadata.xml.openaire
[/dspace/]config/spring/api/virtual-metadata.xml
181
Please note if you do this you should restart your tomcat service container.
This additional virtual metadata will enable to represent something like this in this XML in the oai_openaire metadata format, where you have, for instance,
author identifiers:
<datacite:creators>
<datacite:creator>
<datacite:creatorName>Evans, R.J.</datacite:creatorName>
<datacite:affiliation>Institute of Science and Technology</datacite:affiliation>
<datacite:nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org"> 1234-1234-1234-1234 <
/datacite:nameIdentifier>
</datacite:creator>
</datacite:creators>
Then you may need to run the OAI import from the command line with the cleaning cache parameter to reload all data to OAI:
[/dspace/]/bin/dspace oai import -c
182
Signposting
Overview
The concept of Signposting is aimed at facilitating machine agents in navigating scholarly information systems easily. Signposting uses typed links to
clarify patterns found in scholar portals, offering a standard approach to address the issue of making the descriptive metadata and links in landing pages,
usually optimized for human use, readable for machine agents.
To provide machine-friendly authorship information, the publisher can include author links in the Link header of the HTTP response. Additionally, the
publisher can use a "cite-as" link to fetch the persistent identifier of the resource. These links enable bots to follow them and discover relevant additional
information related to the resource.
By adopting Signposting techniques, the users contribute to improving the machine accessibility and navigation of scholarly web resources, enhancing the
overall efficiency and interoperability of scholarly information systems.
DSpace supports FAIR Signposting Profile at Level 2: By supporting the FAIR Signposting Profile at Level 2, your platform demonstrates a commitment to
improving the machine accessibility, interoperability, and reusability of scholarly resources. It ensures that the information you provide is standardized,
consistent, and easily navigable by both human users and machine agents, contributing to a more efficient and FAIR scholarly web ecosystem. More
information on: https://github.com/DSpace/RestContract/blob/main/signposting.md
The FAIR Signposting profile (more information on: https://signposting.org/FAIR/) is based on the FAIR principles (Findable, Accessible, Interoperable, and
Reusable - https://www.go-fair.org/fair-principles/).
Findability: Your system ensures that scholarly resources are easily discoverable by both humans and machines. It includes the use of persistent
identifiers, such as DOIs (Digital Object Identifiers), to uniquely identify and locate resources. These identifiers are included in the signposting
links provided in the HTTP responses.
Accessibility: Your system supports accessibility by providing machine-readable metadata and links that facilitate automated processing. The
Signposting Patterns specified in the profile guide the inclusion of links in the HTTP Link headers, HTML link elements, or Link Sets. These links
convey essential information about the resource, such as authorship, identifiers, and relationships to other resources.
Interoperability: Your system promotes interoperability by adopting standardized formats and protocols. It ensures that the signposting links and
metadata adhere to established conventions and vocabularies, making it easier for machines to interpret and process the information consistently.
By implementing the FAIR Signposting Profile, your system aligns with a community-accepted standard for interoperability.
Reusability: Your system supports reusability by providing clear and structured metadata about scholarly resources. This includes information
about licenses, permissions, and terms of use. By including this information in the signposting links or associated metadata, your system enables
users and machines to understand the conditions under which the resources can be reused.
Enabling / Disabling
Signposting is enabled by default in DSpace 7 (starting with version 7.6). When enabled on the backend, the ${dspace.server.url}/signposting/
REST Endpoint will be available and can be used based on the documentation at https://github.com/DSpace/RestContract/blob/main/signposting.md. Whe
n disabled, this endpoint will return a 404.
However, if you wish to disable it, you can change this configuration in your local.cfg
signposting.enabled = false
Modifications to this setting require rebooting your servlet container (e.g. Tomcat)
Configuration
Additional signposting configuration options are available in [dspace]/config/modules/signposting.cfg. For most sites, the default settings
should be all you need.
183
Ingesting Content and Metadata
This is a new top level page grouping all documentation concerning all different ways to ingest content and metadata into DSpace
The section on Batch Metadata Editing also contains information on how to add items through spreadsheet ingest.
184
Ingesting HTML Archives
Not yet supported in DSpace 7. See https://github.com/DSpace/DSpace/issues/8635
For the most part, at present DSpace simply supports uploading and downloading of bitstreams as-is. This is fine for the majority of commonly-used file
formats – for example PDFs, Microsoft Word documents, spreadsheets and so forth. HTML documents (Web sites and Web pages) are far more
complicated, and this has important ramifications when it comes to digital preservation:
Web pages tend to consist of several files – one or more HTML files that contain references to each other, and stylesheets and image files that
are referenced by the HTML files.
Web pages also link to or include content from other sites, often imperceptibly to the end-user. Thus, in a few year's time, when someone views
the preserved Web site, they will probably find that many links are now broken or refer to other sites than are now out of context.In fact, it may be
unclear to an end-user when they are viewing content stored in DSpace and when they are seeing content included from another site, or have
navigated to a page that is not stored in DSpace. This problem can manifest when a submitter uploads some HTML content. For example, the
HTML document may include an image from an external Web site, or even their local hard drive. When the submitter views the HTML in DSpace,
their browser is able to use the reference in the HTML to retrieve the appropriate image, and so to the submitter, the whole HTML document
appears to have been deposited correctly. However, later on, when another user tries to view that HTML, their browser might not be able to
retrieve the included image since it may have been removed from the external server. Hence the HTML will seem broken.
Often Web pages are produced dynamically by software running on the Web server, and represent the state of a changing database underneath
it.
Dealing with these issues is the topic of much active research. Currently, DSpace bites off a small, tractable chunk of this problem. DSpace can store and
provide on-line browsing capability for self-contained, non-dynamic HTML documents. DSpace allows relative links between HTML documents stored
together in a single item to work. In practical terms, this means:
185
SWORDv2 Server
SWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of items into repositories. DSpace implements the
SWORD protocol via the 'sword' web application. The specification and further information can be found at http://swordapp.org/.
SWORD is based on the Atom Publish Protocol and allows service documents to be requested which describe the structure of the repository, and
packages to be deposited.
swordv2-server.enabled = true
# Optionally, if you wish to change its path
swordv2-server.path = swordv2
Keep in mind, modifying these settings will require restarting your Servlet Container (usually Tomcat).
Once enabled, the SWORDv2 module will be available at ${dspace.server.url}/${swordv2-server.path}. For example, if "dspace.server.url=http://localhost:
8080/server", then (by default) it will be available at http://localhost:8080/server/swordv2/
Configuration [dspace]/config/modules/swordv2-server.cfg
File:
Property: swordv2-server.enabled
Informational Whether SWORDv2 module is enabled or disabled (disabled by default). Modifying this setting will require restarting your Servlet
Note: Container (usually Tomcat).
Property: swordv2-server.path
Informational When enabled, this is the subpath where the SWORDv2 module is deployed. This path is relative to ${dspace.server.url}.
Note: Modifying this setting will require restarting your Servlet Container (usually Tomcat).
Property: swordv2-server.url
Informational The base url of the SWORD 2.0 system. This defaults to ${dspace.server.url}/${swordv2-server.path}
Note:
Property: swordv2-server.collection.url
Informational The base URL of the SWORD collection. This is the URL from which DSpace will construct the deposit location URLs for
Note: collections. This defaults to ${dspace.server.url}/${swordv2-server.path}/collection
Property: swordv2-server.servicedocument.url
Informational The service document URL of the SWORD collection. The base URL of the SWORD service document. This is the URL from which
Note: DSpace will construct the service document location urls for the site, and for individual collections. This defaults to ${dspace.
server.url}/${swordv2-server.path}/servicedocument
186
Property: swordv2-server.accept-packaging.collection
Example Value:
swordv2-server.accept-packaging.collection.METSDSpaceSIP = http://purl.org/net/sword/package
/METSDSpaceSIP
swordv2-server.accept-packaging.collection.SimpleZip = http://purl.org/net/sword/package
/SimpleZip
swordv2-server.accept-packaging.collection.Binary = http://purl.org/net/sword/package/Binary
Informational The accept packaging properties, along with their associated quality values where appropriate.
Note:
Package format information
METSDSpaceSIP: zipfile containing mets.xml file describing the resources packed together with it in the root of the zipfile.
Binary: Binary resource that should be taken in as-is, not unpacked
SimpleZip: Zip file that should be unpacked and each file in the zip should be ingested separately. No metadata provided
/ingested.
Property: swordv2-server.accept-packaging.item
Example Value:
swordv2-server.accept-packaging.item.METSDSpaceSIP = http://purl.org/net/sword/package
/METSDSpaceSIP
swordv2-server.accept-packaging.item.SimpleZip = http://purl.org/net/sword/package/SimpleZip
swordv2-server.accept-packaging.item.Binary = http://purl.org/net/sword/package/Binary
Informational The accept packaging properties for items. It is possible to configure this for specific collections by adding the handle of the
Note: collection to the setting, for example swordv2-server.accept-packaging.collection.[handle].METSDSpaceSIP = htt
p://purl.org/net/sword-types/METSDSpaceSIP
METSDSpaceSIP: zipfile containing mets.xml file describing the resources packed together with it in the root of the zipfile.
Binary: Binary resource that should be taken in as-is, not unpacked
SimpleZip: Zip file that should be unpacked and each file in the zip should be ingested separately. No metadata provided
/ingested.
Property: swordv2-server.accepts
Example Value:
swordv2-server.accepts = application/zip, image/jpeg
Informational A comma-separated list of MIME types that SWORD will accept. To accept all mimetypes, the value can be set to "*/*"
Note:
Property: swordv2-server.expose-communities
Example Value:
swordv2-server.expose-communities = false
Informational Whether or not the server should expose a list of all the communities to a service document request. As deposits can only be
Note: made into a collection, it is recommended to leave this set to false.
Property: swordv2-server.max-upload-size
Example Value:
swordv2-server.max-upload-size = 0
Informational The maximum upload size of a package through the SWORD interface (measured in bytes). This will be the combined size of all
Note: the files, metadata, and manifest file in a package - this is different to the maximum size of a single bitstream.
Property: swordv2-server.keep-original-package
187
Example Value:
swordv2-server.keep-original-package = true
Informational Should DSpace store a copy of the orignal SWORD deposit package?
Note:
This will cause the deposit process to be slightly slower and for more disk to be used, however original files will be preserved. It is
recommended to leave this option enabled.
Property: swordv2-server.bundle.name
Example Value:
swordv2-server.bundle.name = SWORD
Informational The bundle name that SWORD should store incoming packages within if swordv2-server.keep-original-package is set to
Note: true.
Property: swordv2-server.bundle.deleted
Informational The bundle name that SWORD should use to store deleted bitstreams if swordv2-server.versions.keep is set to true. This
Note: will be used in the case that individual files are updated or removed via SWORD. If the entire Media Resource (files in the
ORIGINAL bundle) is removed this will be backed up in its entirety in a bundle of its own
Property: swordv2-server.keep-package-on-fail
Example Value:
swordv2-server.keep-package-on-fail = false
Informational In the event of package ingest failure, provide an option to store the package on the file system. The default is false. The location
Note: can be set using the swordv2-server.failed-package-dir setting.
Property: swordv2-server.failed-package-dir
Example Value:
swordv2-server.failed-package-dir = /dspace/upload
Informational If swordv2-server.keep-package-on-fail is set to true, this is the location where the package would be stored.
Note:
Property: swordv2-server.on-behalf-of.enable
Example Value:
swordv2-server.on-behalf-of.enable = true
Informational Should DSpace accept mediated deposits? See the SWORD specification for a detailed explanation of deposit On-Behalf-Of
Note: another user.
Property: swordv2-server.on-behalf-of.update.mediators
Informational Which user accounts are allowed to do updates on items which already exist in DSpace, on-behalf-of other users?
Note:
If this is left blank, or omitted, then all accounts can mediate updates to items, which could be a security risk, as there is no implicit
checking that the authenticated user is a "legitimate" mediator
Property: swordv2-server.verbose-description.receipt.enable
Informational Should the deposit receipt include a verbose description of the deposit? For use by developers - recommend to set to "false" for
Note: production systems
188
Property: swordv2-server.verbose-description.error.enable
Informational should the error document include a verbose description of the error? For use by developers, although you may also wish to leave
Note: this set to "true" for production systems
Property: swordv2-server.error.alternate.url
Informational The error document can contain an alternate url, which the client can use to follow up any issues. For example, this could point to
Note: the Contact-Us page
Property: swordv2-server.error.alternate.content-type
Informational The swordv2-server.error.alternate.url may have an associated content type, such as text/html if it points to a web
Note: page. This is used to indicate to the client what content type it can expect if it follows that url.
Property: swordv2-server.generator.url
Example Value:
swordv2-server.generator.url = http://www.dspace.org/ns/sword/2.0/
Informational The URL which identifies DSpace as the software that is providing the SWORD interface.
Note:
Property: swordv2-server.generator.version
Example Value:
swordv2-server.generator.version = 2.0
Property: swordv2-server.auth-type
Example Value:
swordv2-server.auth-type = Basic
Informational Which form of authentication to use. Normally this is set to Basic in order to use HTTP Basic.
Note:
Property: swordv2-server.upload.tempdir
Example Value:
swordv2-server.upload.tempd = /dspace/upload
Informational The location where uploaded files and packages are stored while being processed.
Note:
Property: swordv2-server.updated.field
Example Value:
swordv2-server.updated.field = dc.date.updated
Informational The metadata field in which to store the updated date for items deposited via SWORD.
Note:
189
Property: swordv2-server.slug.field
Example Value:
swordv2-server.slug.field = dc.identifier.slug
Informational The metadata field in which to store the value of the slug header if it is supplied.
Note:
Property: swordv2-server.author.field
Example Value:
swordv2-server.author.field = dc.contributor.author
Informational The metadata field in which to store the value of the atom entry author if it supplied.
Note:
Property: swordv2-server.title.field
Example Value:
swordv2-server.title.field = dc.title
Informational The metadata field in which to store the value of the atom entry title if it supplied.
Note:
Property: swordv2-server.disseminate-packaging
Example Value:
swordv2-server.disseminate-packaging.METSDSpaceSIP = http://purl.org/net/sword/package
/METSDSpaceSIP
swordv2-server.disseminate-packaging.SimpleZip = http://purl.org/net/sword/package/SimpleZip
Property: swordv2-server.statement.bundles
Informational Which bundles should the Statement include in its list of aggregated resources? The Statement will automatically mark
Note:
any bitstreams which are in the bundle identified by the ${bundle.name} property, provided that bundle is also listed
here (i.e. if you want Original Deposits to be listed in the Statement then you should add the SWORD bundle to this list)
Property: plugin.single.org.dspace.sword2.WorkflowManager
Example Value:
plugin.single.org.dspace.sword2.WorkflowManager = org.dspace.sword2.WorkflowManagerDefault
Property: swordv2-server.workflowmanagerdefault.always-update-metadata
Informational Should the WorkflowManagerDefault plugin allow updates to the item's metadata to take place on items which are in states other
Note than the workspace (e.g. in the workflow, archive, or withdrawn) ?
Property: swordv2-server.workflowmanagerdefault.file-replace.enable
190
Informational Should the server allow PUT to individual files?
Note
If this is enabled, then DSpace may be used with the DepositMO SWORD extensions, BUT the caveat is that DSpace does not
formally support Bitstream replace, so this is equivalent to a DELETE and then a POST, which violates the RESTfulness of the
server. The resulting file DOES NOT have the same identifier as the file it was replacing. As such it is STRONGLY
RECOMMENDED to leave this option turned off unless working explicitly with DepositMO enabled client environments
Property: swordv2-server.mets-ingester.package-ingester
Example Value:
swordv2-server.mets-ingester.package-ingester = METS
Property: swordv2-server.restore-mode.enable
Example Value:
swordv2-server.restore-mode.enable = false
Informational Should the SWORD server enable restore-mode when ingesting new packages. If this is enabled the item will be treated as a
Note: previously deleted item from the repository. If the item has previously been assigned a handle then that same handle will be
restored to activity.
Property: swordv2-server.simpledc.*
Example Value:
swordv2-server.simpledc.abstract = dc.description.abstractsimpledc.date = dc.datesimpledc.
rights = dc.rights
Informational Configuration of metadata field mapping used by the SimpleDCEntryIngester, SimpleDCEntryDisseminator and
Note: FeedContentDisseminator
Property: swordv2-server.atom.*
Informational Configuration of metadata field mapping used by the SimpleDCEntryIngester, SimpleDCEntryDisseminator and
Note: FeedContentDisseminator
Property: swordv2-server.metadata.replaceable
Informational Used by SimpleDCEntryIngester: Which metadata fields can be replaced during a PUT to the Item of an Ato
Note
m Entry document? Fields listed here are the ones which will be removed when a new PUT comes through (irrespective of whether
there is a new incoming value to replace them)
Property: swordv2-server.multipart.entry-first
Example Value:
swordv2-server.multipart.entry-first = false
Informational The order of precedence for importing multipart content. If this is set to true then metadata in the package will override metadata
Note: in the atom entry, otherwise the metadata in the atom entry will override that from the package.
Property: swordv2-server.workflow.notify
Example Value:
swordv2-server.workflow.notify = true
Informational If the workflow gets started (the collection being deposited into has a workflow configured), should a notification get sent?
Note:
191
Property: swordv2-server.versions.keep
Example Value:
swordv2-server.versions.keep = true
Informational When content is replaced, should the old version be kept? This creates a copy of the ORIGINAL bundle with the name V_YYYY-
Note: MM-DD.X where YYYY-MM-DD is the date the copy was created, and X is an integer from 0 upwards.
Property: swordv2-server.state.*
Example Value:
swordv2-server.state.workspace.uri = http://dspace.org/state/inprogress
swordv2-server.state.workspace.description = The item is in the user workspace
swordv2-server.state.workflow.uri = http://dspace.org/state/inreview
swordv2-server.state.workflow.description = The item is undergoing review prior to acceptance
in the archive
Informational Pairs of states (URI and description) than items can be in. Typical states are workspace, workflow, archive, and withdrawn.
Note:
Property: swordv2-server.workspace.url-template
Informational URL template for links to items in the workspace (items in the archive will use the handle). The #wsid# url parameter will be
Note replaced with the workspace id of the item. The example above shows how to construct this URL for the UI.
Other configuration options exist that define the mapping between mime types, ingesters, and disseminators. A typical configuration looks like this:
plugin.named.org.dspace.sword2.SwordContentIngester = \
org.dspace.sword2.SimpleZipContentIngester = http://purl.org/net/sword/package/SimpleZip, \
org.dspace.sword2.SwordMETSIngester = http://purl.org/net/sword/package/METSDSpaceSIP, \
org.dspace.sword2.BinaryContentIngester = http://purl.org/net/sword/package/Binary
plugin.single.org.dspace.sword2.SwordEntryIngester = \
org.dspace.sword2.SimpleDCEntryIngester
plugin.single.org.dspace.sword2.SwordEntryDisseminator = \
org.dspace.sword2.SimpleDCEntryDisseminator
# note that we replace ";" with "_" as ";" is not permitted in the PluginManager names
plugin.named.org.dspace.sword2.SwordContentDisseminator = \
org.dspace.sword2.SimpleZipContentDisseminator = http://purl.org/net/sword/package/SimpleZip, \
org.dspace.sword2.FeedContentDisseminator = application/atom+xml, \
org.dspace.sword2.FeedContentDisseminator = application/atom+xml_type_feed
# note that we replace ";" with "_" as ";" is not permitted in the PluginManager names
plugin.named.org.dspace.sword2.SwordStatementDisseminator = \
org.dspace.sword2.AtomStatementDisseminator = atom, \
org.dspace.sword2.OreStatementDisseminator = rdf, \
org.dspace.sword2.AtomStatementDisseminator = application/atom+xml_type_feed, \
org.dspace.sword2.OreStatementDisseminator = application/rdf+xml
192
# Deposit a SWORD Zip package named "sword-package.zip" into a DSpace Collection (Handle 123456789/2) as
user "[email protected]"
# (Please note that you WILL need to obviously modify the Collection location, user/password and name of
the SWORD package)
# Example of retrieving Item information via "edit-media" path in ATOM format (can be run on any item within
DSpace, but requires authentication)
# NOTE: Accept header is required, and must be a format supported by a SwordContentDisseminator plugin (see
configuration above)
curl -i -H "Accept:application/atom+xml" -u [email protected]:[password] -X GET http://[dspace.url]/swordv2/edit-
media/[internal-item-identifier]
Troubleshooting
If you use:
<?xml version="1.0"?>
https://api7.dspace.org/server/sword/servicedocument
https://api7.dspace.org/server/swordv2/servicedocument
193
SWORDv1 Server
SWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of items into repositories. DSpace implements the
SWORD protocol via the 'sword' web application. The version of SWORD v1 currently supported by DSpace is 1.3. The specification and further
information can be found at http://swordapp.org.
SWORD is based on the Atom Publish Protocol and allows service documents to be requested which describe the structure of the repository, and
packages to be deposited.
sword-server.enabled = true
# Optionally, if you wish to change its path
sword-server.path = sword
Keep in mind, modifying these settings will require restarting your Servlet Container (usually Tomcat).
Once enabled, the SWORD v1 module will be available at ${dspace.server.url}/${sword-server.path}. For example, if "dspace.server.url=http://localhost:
8080/server", then (by default) it will be available at http://localhost:8080/server/sword/
Configuration [dspace]/config/modules/sword-server.cfg
File:
Property: sword-server.enabled
Informational Whether SWORDv1 module is enabled or disabled (disabled by default). Modifying this setting will require restarting your Servlet
Note: Container (usually Tomcat).
Property: sword-server.path
Informational When enabled, this is the subpath where the SWORDv1 module is deployed. This path is relative to ${dspace.server.url}.
Note: Modifying this setting will require restarting your Servlet Container (usually Tomcat).
Property: sword-server.mets-ingester.package-ingester
Informational The property key tell the SWORD METS implementation which package ingester to use to install deposited content. This should
Note: refer to one of the classes configured for:
plugin.named.org.dspace.content.packager.PackageIngester
The value of sword.mets-ingester.package-ingester tells the system which named plugin for this interface should be used to ingest
SWORD METS packages.
Properties: mets.default.ingest.crosswalk.EPDCX
mets.default.ingest.crosswalk.*
(NOTE: These configs are in the dspace.cfg file as they are used by many interfaces)
194
Informational Define the metadata types which can be accepted/handled by SWORD during ingest of a package. Currently, EPDCX (EPrints DC
Note: XML) is the recommended default metadata format, but others are supported. An example of an EPDCX SWORD package can be
found at [dspace-src]/dspace-sword/example/example.zip.
Additional metadata types can be added to this list by just defining new configurations. For example, you can map a new "mdtype"
MYFORMAT to a custom crosswalk named MYFORMAT:
mets.submission.crosswalk.MYFORMAT = MYFORMAT
You'd also want to map your new custom crosswalk to a stylesheet using the next configuration (crosswalk.submission.*.stylesheet).
Property: crosswalk.submission.EPDCX.stylesheet
(NOTE: This configuration is in the dspace.cfg file)
Informational Define the stylesheet which will be used by the self-named XSLTIngestionCrosswalk class when asked to load the SWORD
Note: configuration (as specified above). This will use the specified stylesheet to crosswalk the incoming SWAP metadata to the DIM
format for ingestion.
Additional crosswalk types can be added to this list by just defining new configurations. For example, you can map a custom
crosswalk named MYFORMAT to use a specific "my-crosswalk.xsl" stylesheet:
crosswalk.submission.MYFORMAT.stylesheet = crosswalks/my-crosswalk.xsl
Keep in mind, you'll need to also ensure MYFORMAT crosswalk is defined by the previous configuration (mets.submission.
crosswalk.*).
Property: sword-server.deposit.url
Example Value:
sword-server.deposit.url = http://www.myu.ac.uk/sword/deposit
Informational The base URL of the SWORD deposit. This is the URL from which DSpace will construct the deposit location URLs for collections.
Note: The default is ${dspace.server.url}/${sword-server.path}/deposit . In the event that you are not deploying DSpace
as the ROOT application in the servlet container, this will generate incorrect URLs, and you should override the functionality by
specifying in full as shown in the example value.
Property: sword-server.servicedocument.url
Example Value:
sword-server.servicedocument.url = http://www.myu.ac.uk/sword/servicedocument
Informational The base URL of the SWORD service document. This is the URL from which DSpace will construct the service document location
Note: URLs for the site, and for individual collections. The default is ${dspace.server.url}/${sword-server.path}
/servicedocument. In the event that you are not deploying DSpace as the ROOT application in the servlet container, this will
generate incorrect URLs, and you should override the functionality by specifying in full as shown in the example value.
Property: sword-server.media-link.url
Example Value:
sword-server.media-link.url = http://www.myu.ac.uk/sword/media-link
Informational The base URL of the SWORD media links. This is the URL which DSpace will use to construct the media link URLs for items which
Note: are deposited via sword. The default is ${dspace.server.url}/${sword-server.path}/media-link. In the event that you
are not deploying DSpace as the ROOT application in the servlet container, this will generate incorrect URLs, and you should
override the functionality by specifying in full as shown in the example value.
Property: sword-server.generator.url
Example Value:
sword-server.generator.url = http://www.dspace.org/ns/sword/1.3.1
Informational The URL which identifies the SWORD software which provides the sword interface. This is the URL which DSpace will use to fill out
Note: the atom:generator element of its atom documents. The default is: http://www.dspace.org/ns/sword/1.3.1
If you have modified your SWORD software, you should change this URI to identify your own version. If you are using the standard
'dspace-sword' module you will not, in general, need to change this setting.
Property: sword-server.updated.field
195
Example Value: sword-server.updated.field = dc.date.updated
Informational The metadata field in which to store the updated date for items deposited via SWORD.
Note:
Property: sword-server.slug.field
Informational The metadata field in which to store the value of the slug header if it is supplied.
Note:
Properties:
sword-server.accept-packaging.METSDSpaceSIP.identifier
sword-server.accept-packaging.METSDSpaceSIP.q
Example Value:
sword-server.accept-packaging.METSDSpaceSIP.identifier = http://purl.org/net/sword-types
/METSDSpaceSIP
sword-server.accept-packaging.METSDSpaceSIP.q = 1.0
Informational The accept packaging properties, along with their associated quality values where appropriate. This is a Global Setting; these will be
Note: used on all DSpace collections
Property: sword-server.accepts
Informational A comma separated list of MIME types that SWORD will accept.
Note:
Properties:
sword-server.accept-packaging.[handle].METSDSpaceSIP.identifier
sword-server.accept-packaging.[handle].METSDSpaceSIP.q
Example Value:
sword-server.accept-packaging.[handle].METSDSpaceSIP.identifier = http://purl.org/net/sword-
types/METSDSpaceSIP
sword-server.accept-packaging.[handle].METSDSpaceSIP.q = 1.0
Informational Collection Specific settings: these will be used on the collections with the given handles.
Note:
Property: sword-server.expose-items
Informational Should the server offer up items in collections as sword deposit targets. This will be effected by placing a URI in the collection
Note: description which will list all the allowed items for the depositing user in that collection on request. NOTE: this will require an
implementation of deposit onto items, which will not be forthcoming for a short while.
Property: sword-server.expose-communities
Informational Should the server offer as the default the list of all Communities to a Service Document request. If false, the server will offer the list
Note: of all collections, which is the default and recommended behavior at this stage. NOTE: a service document for Communities will not
offer any viable deposit targets, and the client will need to request the list of Collections in the target before deposit can continue.
Property: sword-server.max-upload-size
Informational The maximum upload size of a package through the sword interface, in bytes. This will be the combined size of all the files, the
Note: metadata and any manifest data. It is NOT the same as the maximum size set for an individual file upload through the user
interface. If not set, or set to 0, the sword service will default to no limit.
Property: sword-server.keep-original-package
196
Informational Whether or not DSpace should store a copy of the original sword deposit package. NOTE: this will cause the deposit process to run
Note: slightly slower, and will accelerate the rate at which the repository consumes disk space. BUT, it will also mean that the deposited
packages are recoverable in their original form. It is strongly recommended, therefore, to leave this option turned on. When set to
"true", this requires that the configuration option upload.temp.dir (in dspace.cfg) is set to a valid location.
Property: sword-server.bundle.name
Informational The bundle name that SWORD should store incoming packages under if sword.keep-original-package is set to true. The default is
Note: "SWORD" if not value is set
Properties: sword-server.keep-package-on-fail
sword-server.failed-package.dir
Example Value:
sword-server.keep-package-on-fail=true
sword-server.failed-package.dir=${dspace.dir}/upload
Informational In the event of package ingest failure, provide an option to store the package on the file system. The default is false.
Note:
Property: sword-server.identify-version
Informational Should the server identify the sword version in a deposit response. It is recommended to leave this unchanged.
Note:
Property: sword-server.on-behalf-of.enable
Informational Should mediated deposit via sword be supported. If enabled, this will allow users to deposit content packages on behalf of other
Note: users.
Property: sword-server.restore-mode.enable
Informational Should the sword server enable restore-mode when ingesting new packages. If this is enabled the item will be treated as a
Note: previously deleted item from the repository. If the item had previously been assigned a handle then that same handle will be
restored to activity. If that item had not been previously assign a handle, then a new handle will be assigned.
Property: plugin.named.org.dspace.sword.SWORDingester
Example Value:
plugin.named.org.dspace.sword.SWORDIngester = \
org.dspace.sword.SWORDMETSIngester = http://purl.org/net/sword-types/METSDSpaceSIP \
org.dspace.sword.SimpleFileIngester = SimpleFileIngester
Informational Configure the plugins to process incoming packages. The form of this configuration is as per the Plugin Manager's Named Plugin
Note: documentation: plugin.named.[interface] = [implementation] = [package format identifier] (see dspace.
cfg). Package ingesters should implement the SWORDIngester interface, and will be loaded when a package of the format
specified above in: sword-server.accept-packaging.[package format].identifier = [package format
identifier] is received. In the event that this is a simple file deposit, with no package format, then the class named by
"SimpleFileIngester" will be loaded and executed where appropriate. This case will only occur when a single file is being deposited
into an existing DSpace Item.
A variety of SWORDv1 Clients (in various languages/tools) are available off of http://swordapp.org/sword-v1/
DSpacealso comes with an optional SWORDv1 Client which can be enabled to deposit content from one DSpace to another.
An example SWORDv1 ZIP package is available in the DSpace Codebase at: https://github.com/DSpace/DSpace/tree/dspace-5_x/dspace-sword
/example
Finally, it's also possible to simply deposit a valid SWORD Zip package via common Linux commandline tools (e.g. curl). For example:
197
# Deposit a SWORD Zip package named "sword-package.zip" into a DSpace Collection (Handle 123456789/2) as
user "[email protected]"
# (Please note that you WILL need to obviously modify the Collection location, user/password and name of
the SWORD package)
https://api7.dspace.org/server/sword/servicedocument
https://api7.dspace.org/server/swordv2/servicedocument
198
Exporting and Importing Community and Collection
Hierarchy
1 Community and Collection Structure Importer
1.1 Usage
1.2 XML Import Format
2 Community and Collection Structure Exporter
2.1 Usage
Usage
-f Source xml file. The presence of this argument engages import mode.
-o Output xml file. Required. A copy of the input augmented with the Handles assigned to each new
Community or Collection.
<import_structure>
<community>
<name>Community Name</name>
<description>Descriptive text</description>
<intro>Introductory text</intro>
<copyright>Special copyright notice</copyright>
<sidebar>Sidebar text</sidebar>
<community>
<name>Sub Community Name</name>
<community> ...[ad infinitum]...
</community>
</community>
<collection>
<name>Collection Name</name>
<description>Descriptive text</description>
<intro>Introductory text</intro>
<copyright>Special copyright notice</copyright>
<sidebar>Sidebar text</sidebar>
<license>Special licence</license>
<provenance>Provenance information</provenance>
</collection>
</community>
</import_structure>
199
<import_structure>
<community identifier="123456789/1">
<name>Community Name</name>
<description>Descriptive text</description>
<intro>Introductory text</intro>
<copyright>Special copyright notice</copyright>
<sidebar>Sidebar text</sidebar>
<community identifier="123456789/2">
<name>Sub Community Name</name>
<community identifier="123456789/3"> ...[ad infinitum]...
</community>
</community>
<collection identifier="123456789/4">
<name>Collection Name</name>
<description>Descriptive text</description>
<intro>Introductory text</intro>
<copyright>Special copyright notice</copyright>
<sidebar>Sidebar text</sidebar>
<license>Special licence</license>
<provenance>Provenance information</provenance>
</collection>
</community>
</import_structure>
This command-line tool gives you the ability to import a community and collection structure directly from a source XML file. It is executed as follows:
This will examine the contents of source.xml, import the structure into DSpace while logged in as the supplied administrator, and then output the same
structure to the output file, but including the handle for each imported community and collection as an attribute.
Usage
-x, --export Export the current structure as XML. The presence of this argument engages export mode.
-e, --eperson email or netid User who is manipulating the repository's structure. Required. This user's rights determine access to
communities and collections.
-o, --output file path The exported structure is written here. Required.
-h or -? Help
200
Importing Items via basic bibliographic formats (Endnote,
BibTex, RIS, CSV, etc) and online services (arXiv, PubMed,
CrossRef, CiNii, etc)
1 Introduction
2 Supported External Sources
3 Disabling an External source
4 Submitting starting from external sources
5 Submitting starting from bibliographic file
6 More Information
In DSpace 7.0, the Biblio-Transformation-Engine (BTE) was removed in favor of Live Import from external sources. All online services and bibliographic
formats previously supported by BTE have been moved or are being moved to the External Sources framework.
Introduction
This documentation explains the features and the usage of the importer framework. The importer framework is built into both the REST API and User
Interface. Currently supported formats include:
Drag & drop of Endnote, BibTex, RIS, TSV, CSV, arXiv, PubMed. From the MyDSpace page, dragging & dropping one of these files will start a
new submission, extracting the metadata from the file.
Import via ORCID, PubMed, Sherpa Journals, Sherpa Publishers. From the MyDSpace page, you can select to start a new submission by
searching an external source.
Supported online services are all configured on the backend in the [dspace]/config/spring/api/external-services.xml file. To disable a
service, simply comment it out in that file.
NASA Astrophysics Data System (ADS) lookup (Supported for creating new Items, or "Publication" Entities). Can be configured via "ads.*"
settings in external-providers.cfg. REQUIRES an API key to function, signup at https://ui.adsabs.harvard.edu/help/api/
arXiv lookup (Supported for creating new Items, or "Publication" Entities).
CiNii lookup (Supported for creating new Items, or "Publication" Entities). Can be configured via "cinii.*" settings in external-providers.cfg. REQUIR
ES an API key to function, signup at https://support.nii.ac.jp/en/cinii/api/developer
CrossRef lookup (Supported for creating new Items, or "Publication" Entities). Can be configured via "crossref.*" settings in external-providers.cfg
European Patent Office (EPO) lookup (Supported for creating new Items, or "Publication" Entities). Can be configured via "epo.*" settings in
external-providers.cfg. REQUIRES an API key to function, signup at https://developers.epo.org/
ORCID
ORCID author lookup (Only supported for creating new "Person" Entities). Can be configured via "orcid.*" settings in orcid.cfg.
ORCID publication lookup (Supported for creating new Items, or "Publication" Entities). Allows you to lookup a publication based on an
author's ORCID. Can be configured via "orcid.*" settings in orcid.cfg.
PubMed
Search PubMed (Supported for creating new Items, or "Publication" Entities). Can be configured via "pubmed.*" settings in external-
providers.cfg
Search PubMed Europe (Supported for creating new Items, or "Publication" Entities). Can be configured via "pubmedeurope.*" settings
in external-providers.cfg
SciELO lookup (Supported for creating new Items, or "Publication" Entities). Can be configured via "scielo.*" settings in external-providers.cfg.
Scopus lookup (Supported for creating new Items, or "Publication" Entities). Can be configured via "scopus.*" settings in external-providers.cfg. R
EQUIRES an API key to function, signup at https://dev.elsevier.com
Sherpa Romeo
Sherpa Journals by ISSN (Only supported for creating new "Journal" Entities)
Sherpa Journals (Only supported for creating new "Journal" Entities) - supports looking up a Journal by title
Sherpa Publishers (Only supported for creating new "OrgUnit" Entities)
VuFind lookup (Supported for creating new Items, or "Publication" Entities). Can be configured via "vufind.*" settings in external-providers.cfg
Web of Science lookup (Supported for creating new Items, or "Publication" Entities). Can be configured via "wos.*" settings in external-providers.
cfg. REQUIRES a paid API key to function, signup at https://developer.clarivate.com/apis/wos
Currently this WOS integration requires a paid license and does NOT yet support the WOS Starter API. See this issue ticket for more
information: https://github.com/DSpace/DSpace/issues/8695
To disable an external source, simply comment out its "<bean>" tag in the external-services.xml file. Comment it out using XML comment tags (<!
-- and -->).
201
For example, this will disable the Scopus external service (which is one that requires a paid subscription):
<!--
<bean id="scopusLiveImportDataProvider" class="org.dspace.external.provider.impl.LiveImportDataProvider">
<property name="metadataSource" ref="ScopusImportService"/>
<property name="sourceIdentifier" value="scopus"/>
<property name="recordIdMetadata" value="dc.identifier.scopus"/>
<property name="supportedEntityTypes">
<list>
<value>Publication</value>
</list>
</property>
</bean>
-->
1. From the MyDSpace page a new submission can be started not only using the submission form but also automatically populating metadata,
importing them from several online services.
2. After choosing the external source to import from and inserting a term in search bar, the system will show the list of matching results.
3. When selecting an item, the system will display the metadata to be imported, according to the configured mapping.
4. Clicking on “Start submission” the system will display the submission forms filled with the imported metadata.
More Information
More information on configuring metadata mappings for various import formats / services can be found in the Live Import from external sources
documentation. See the "Editing Metadata Mapping" section.
202
Registering Bitstreams via Simple Archive Format
1 Overview
1.1 Accessible Storage
1.2 Registering Items Using the Item Importer
1.3 Internal Identification and Retrieval of Registered Items
1.4 Exporting Registered Items
1.5 Deleting Registered Items
The procedures below will not import the actual bitstreams into DSpace. They will merely inform DSpace of an existing location where these Bitstreams
can be found. Please refer to Importing and Exporting Items via Simple Archive Format (SAF) for information on importing metadata and bitstreams.
Overview
Registration is an alternate means of incorporating items, their metadata, and their bitstreams into DSpace by taking advantage of the bitstreams already
being in storage accessible to DSpace. An example might be that there is a repository for existing digital assets. Rather than using the normal interactive
ingest process or the batch import to furnish DSpace the metadata and to upload bitstreams, registration provides DSpace the metadata and the location
of the bitstreams. DSpace uses a variation of the import tool to accomplish registration.
Accessible Storage
To register an item its bitstreams must reside on storage accessible to DSpace and therefore referenced by an asset store number in dspace.cfg. The
configuration file dspace.cfg establishes one or more asset stores through the use of an integer asset store number. This number relates to a directory in
the DSpace host's file system or a set of SRB account parameters. This asset store number is described in The dspace.cfg Configuration Properties File
section and in the dspace.cfg file itself. The asset store number(s) used for registered items should generally not be the value of the assetstore.incoming
property since it is unlikely that you will want to mix the bitstreams of normally ingested and imported items and registered items.
The DSpace Simple Archive Format for registration does not include the actual content files (bitstreams) being registered. The format is however a
directory full of items to be registered, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata (dublin_core.xml)
and a file listing the item's content files (contents), but not the actual content files themselves.
The dublin_core.xml file for item registration is exactly the same as for regular item import.
The contents file, like that for regular item import, lists the item's content files, one content file per line, but each line has the one of the following formats:
-r -s n -f filepath
-r -s n -f filepath\tbundle:bundlename
-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'
-r -s n -f filepath\tbundle:bundlename\tpermissions: -[r|w] 'group name'\tdescription: some text
where
The command line for registration is just like the one for regular import:
The --workflow and --test flags will function as described in Importing Items.
203
The --delete flag will function as described in Importing Items but the registered content files will not be removed from storage. See Deleting Registered
Items.
The --replace flag will function as described in Importing Items but care should be taken to consider different cases and implications. With old items and
new items being registered or ingested normally, there are four combinations or cases to consider. Foremost, an old registered item deleted from DSpace
using --replace will not be removed from the storage. See Deleting Registered Items. where is resides. A new item added to DSpace using --replace
will be ingested normally or will be registered depending on whether or not it is marked in the contents files with the -r.
First, the randomly generated internal ID is not used because DSpace does not control the file path and name of the bitstream. Instead, the file path and
name are that specified in the contents file.
Second, the store_number column of the bitstream database row contains the asset store number specified in the contents file.
Third, the internal_id column of the bitstream database row contains a leading flag (-R) followed by the registered file path and name. For example, -
Rfilepath where filepath is the file path and name relative to the asset store corresponding to the asset store number. The asset store could be
traditional storage in the DSpace server's file system or an SRB account.
Fourth, an MD5 checksum is calculated by reading the registered file if it is in local storage.
Registered items and their bitstreams can be retrieved transparently just like normally ingested items.
204
Importing and Exporting Items via Simple Archive Format
(SAF)
1 Item Importer and Exporter
1.1 DSpace Simple Archive Format
1.1.1 dublin_core.xml or metadata_[prefix].xml
1.1.2 contents file
1.1.3 relationships file
1.2 Configuring metadata_[prefix].xml for a Different Schema
1.3 Importing Items
1.3.1 Adding Items to a Collection from a directory
1.3.2 Adding Items to a Collection from a zipfile
1.3.3 Replacing Items in a Collection
1.3.4 Deleting or Unimporting Items in a Collection
1.3.5 Other Options
1.3.6 UI Batch Import
1.4 Exporting Items
1.4.1 UI Batch Export
archive_directory/
item_000/
dublin_core.xml -- qualified Dublin Core metadata for metadata fields belonging to the 'dc'
schema.
metadata_[prefix].xml -- metadata in another schema. The prefix is the name of the schema as
registered with the metadata registry.
contents -- text file containing one line per filename.
collections -- (Optional) text file that contains the handles of
the collections the item will belong to. Each handle in a row.
-- Collection in first line will be the owning
collection.
handle -- contains the handle assigned/to be assigned to
this resource
relationships -- (Optional) If importing Entities, you can specify one or more relationships
to create on import
file_1.doc -- files to be added as bitstreams to the item.
file_2.pdf
item_001/
dublin_core.xml
contents
file_1.png
...
dublin_core.xml or metadata_[prefix].xml
The dublin_core.xml or metadata_[prefix].xml file has the following format, where each metadata element has its own entry within a <dcvalue>
tagset. There are currently three tag attributes available in the <dcvalue> tagset:
<dublin_core>
<dcvalue element="title" qualifier="none">A Tale of Two Cities</dcvalue>
<dcvalue element="date" qualifier="issued">1990</dcvalue>
<dcvalue element="title" qualifier="alternative" language="fr">J'aime les Printemps</dcvalue>
</dublin_core>
205
(Note the optional language tag attribute which notifies the system that the optional title is in French.)
When providing urls as values for fields that contain the ampersand (&) symbol, the ampersands in these urls have to be encoded as &
Every metadata field used, must be registered via the metadata registry of the DSpace instance first. See Metadata and Bitstream Format Registries.
Recommended Metadata
It is recommended to minimally provide "dc.title" and, where applicable, "dc.date.issued". Obviously you can (and should) provide much more detailed
metadata about the Item. For more information see: Metadata Recommendations.
contents file
The contents file is a plain text document that simply enumerates, one file per line, the bitstream file names. See the following example:
file_1.doc
file_2.pdf
license
Please notice that the license is optional, and if you wish to have one included, you can place the file in the .../item_001/ directory, for example.
\tbundle:BUNDLENAME
\tpermissions:PERMISSIONS
\tdescription:DESCRIPTION
\tprimary:true
'BUNDLENAME' is the name of the bundle to which the bitstream should be added. Without specifying the bundle, items will go into the default bundle,
ORIGINAL.
The IIIF metadata feature was added in 7.2 and is only supported on import ('add' mode) of an SAF package.
For IIIF enabled items, the bitstream name may optionally be followed by any of the following:
\tiiif-label:IIIFLABEL
\tiiif-toc:IIIFTOC
\tiiif-width:IIIFWIDTH
\tiiif-height:IIIFHEIGHT
Where:
'IIIFLABEL' is the label that will be used for the image in the viewer.
'IIIFTOC' is the label that will be used for a table of contents entry in the viewer.
'IIIFWIDTH' is the image width that will be used for the IIIF canvas.
'IIIFHEIGHT' is the image height that will be used for the IIIF canvas.
relationships file
Supported in 7.1 or above for 'import' only.
This feature was added in 7.1. Currently the 'relationships' file is only supported on import ('add' mode) of an SAF package. See note at bottom of this
section about using the "metadata_relation.xml" if you wish to export & update relationships.
The optional relationships file enumerates the relationships of this Entity to other Entities (either already in the system, or also specified in your SAF
import batch). This allows entities to be linked to new or existing entities during import. Entities can be linked to other entities in this import set by referring
to their import subfolder name. Because relationships can only be created for Entities, it can only be used when importing Configurable Entities.
206
Each line in the file contains a relationship type key and an item identifier in the following format:
relation.<relation_key> <handle|uuid|folderName:import_item_folder|schema.element[.qualifier]:value>
The input_item_folder should refer the folder name of another item in this import batch. Example:
relation.isAuthorOfPublication 5dace143-1238-4b4f-affb-ed559f9254bb
relation.isAuthorOfPublication 123456789/1123
relation.isOrgUnitOfPublication folderName:item_001
relation.isProjectOfPublication project.identifier.id:123
relation.isProjectOfPublication project.identifier.name:A Name with Spaces
During initial import, new items are stored in a map keyed by the item folder name. Once the initial import is complete, a second pass checks for a
'relationships' manifest file in each folder and creates a relationship of the specified type to the specified item.
Remember, if you are creating new Entities via an SAF package, those Entities MUST specify a "dspace.entity.type" metadata field. Because this
metadata field is in the "dspace" schema, it MUST be specified in a "metadata_dspace.xml", similar to:
metadata_dspace.xml
<dublin_core schema="dspace">
<dcvalue element="entity" qualifier="type">Publication</dcvalue>
</dublin_core>
If you already know the UUID of an existing Entity that you want to relate to, you can also create/update the "metadata_relation.xml" file to add/update the
relationship, similar to:
metadata_relation.xml
<dublin_core schema="relation">
<dcvalue element="isAuthorOfPublication">5dace143-1238-4b4f-affb-ed559f9254bb</dcvalue>
</dublin_core>
The "relationships" file is primarily for creating relationships between Entities in the same import batch. Of course, you can also choose to use the
"relationships" file to create new relationships to existing Entities instead of creating/updating the "metadata_relation.xml" file. The main advantage of the
"metadata_relation.xml" file is that it is used both on export and import, while the "relationships" file is only used on import at this time.
1. Create a separate file for the other schema named metadata_[prefix].xml, where the [prefix] is replaced with the schema's prefix.
2. Inside the xml file use the same Dublin Core syntax, but on the <dublin_core> element include the attribute schema=[prefix].
3. Here is an example for ETD metadata, which would be in the file metadata_etd.xml:
<dublin_core schema="etd">
<dcvalue element="degree" qualifier="department">Computer Science</dcvalue>
<dcvalue element="degree" qualifier="level">Masters</dcvalue>
<dcvalue element="degree" qualifier="grantor">Michigan Institute of Technology</dcvalue>
</dublin_core>
Importing Items
Before running the item importer over items previously exported from a DSpace instance, please first refer to Transferring Items Between DSpace
Instances.
207
Java class: org.dspace.app.itemimport.ItemImport
-m or --mapfile Where the mapfile for items can be found (name and directory)
-n or --notify Kicks off the email alerting of the item(s) has(have) been imported
The item importer is able to batch import unlimited numbers of items for a particular collection using a very simple CLI command and 'arguments'.
eperson
Collection ID (either Handle (e.g. 123456789/14) or UUID
Source directory where the items reside
Mapfile. Since you don't have one, you need to determine where it will be (e.g. /Import/Col_14/mapfile)
At the command line:
The above command would cycle through the archive directory's items, import them, and then generate a map file which stores the mapping of item
directories to item handles. SAVE THIS MAP FILE. You can use it for replacing or deleting (unimporting) the mapped items.
Testing. You can add --validate (or -v) to the command to simulate the entire import process without actually doing the import. This is extremely
useful for verifying your import files before doing the actual import.
eperson
Collection ID (either Handle (e.g. 123456789/14) or Database ID (e.g. 2)
Source directory where your zipfile containing the items resides
Zipfile
Mapfile. Since you don't have one, you need to determine where it will be (e.g. /Import/Col_14/mapfile)
At the command line:
208
or by using the short form:
The above command would unpack the zipfile, cycle through the archive directory's items, import them, and then generate a map file which stores the
mapping of item directories to item handles. SAVE THIS MAP FILE. You can use it for replacing or deleting (unimporting) the mapped items.
Testing. You can add --validate (or -v) to the command to simulate the entire import process without actually doing the import. This is extremely
useful for verifying your import files before doing the actual import.
Long form:
If you wish to replace content using a Zipfile, that's also possible. The command is similar. But, in this situation "-s" refers to the directory of the zip file,
and "-z" gives the name of the zipfile:
In long form:
Other Options
Workflow. The importer usually bypasses any workflow assigned to a collection. But add the --workflow (-w) argument will route the imported
items through the workflow system.
Templates. If you have templates that have constant data and you wish to apply that data during batch importing, add the --template (-p)
argument.
Resume. If, during importing, you have an error and the import is aborted, you can use the --resume (-R) flag to resume the import where you
left off after you fix the error.
Specifying the owning collection on a per-item basis from the command line administration tool
If you omit the -c flag, which is otherwise mandatory, the ItemImporter searches for a file named "collections" in each item directory. This file
should contain a list of collections, one per line, specified either by their handle, or by their internal db id. The ItemImporter then will put the item in
each of the specified collections. The owning collection is the collection specified in the first line of the collections file.
If both the -c flag is specified and the collections file exists in the item directory, the ItemImporter will ignore the collections file and will put the
item in the collection specified on the command line.
Since the collections file can differ between item directories, this gives you more fine-grained control of the process of batch adding items to
collections.
UI Batch Import
Available in DSpace 7.4 and above.
Batch import can also take place via the Administrator's UI. The steps to follow are:
209
A. Prepare the data
1. Items, i.e. the metadata and their bitstreams, must be in the Simple Archive Format described earlier in this chapter. Thus, for each item there
must be a separate directory that contains the corresponding files of the specific item.
2. Moreover, in each item directory, there can be another file that describes the collection or the collections that this item will be added to. The name
of this file must be "collections" and it is optional. It has the following format:
Each line contains the handle of the collection. The collection in the first line is the owning collection while the rest are the other collections that
the item should belong to.
3. Compress the item directories into a ZIP file. Please note that you need to zip the actual item directories and not just the directory that contains
the item directories. Thus, the final zip file must directly contain the item directories.
1. Login as an Administrator.
2. In the side menu, select "Import" "Batch Import (ZIP)"
4. Clicking "Proceed" will start the Batch Import. This creates a new "Process" which begins the upload of the batch. Depending on the size of the
batch, this process may take some time to complete. You can refresh to page to see the current status, or go back to the list of processes
("Processes" menu in sidebar) to check on its status. Once the process is COMPLETED, you will see a log of the results and a mapfile (which
can be used to make later updates).
5. All prior imports will be listed in the "Processes" menu, until their corresponding process entry is deleted. Once you are satisfied with the import
and have no need to see the logs or mapfile, you may wish to delete that process entry in order to free up storage space (as your uploaded ZIP
will be retained in DSpace until the process is deleted). A "process-cleaner" script can also be started from the "Processes" page which can be
used to bulk delete old processes.
210
It is also possible to start an "import" directly from the "Processes" menu. This allows you to specify additional options/flags which are normally only
available to the command-line "import" tool (see documentation above).
Exporting Items
The item exporter can export a single item or a collection of items, and creates a DSpace simple archive in the aforementioned format for each exported ite
m. The items are exported in a sequential order in which they are retrieved from the database. As a consequence, the sequence numbers of the item
subdirectories (item_000, item_001) are not related to DSpace handle or item IDs.
-t or --type Type of export. COLLECTION will inform the program you want the whole collection. ITEM will be only the specific item. (You
will actually key in the keywords in all caps. See examples below.)
-d or --dest The destination path where you want the file of items to be placed.
-n or --number Sequence number to begin with. Whatever number you give, this will be the name of the first directory created for your export.
The layout of the export directory is the same as the layout used for import.
-m or --migrate Export the item/collection for migration. This will remove the handle and any other metadata that will be re-created in the new
instance of DSpace.
Exporting a Collection
Short form:
The keyword COLLECTION means that you intend to export an entire collection. The ID can either be the database ID or the handle. The exporter will
begin numbering the simple archives with the sequence number that you supply.
To export a single item use the keyword ITEM and give the item ID as an argument:
Short form:
Each exported item will have an additional file in its directory, named "handle". This will contain the handle that was assigned to the item, and this file will
be read by the importer so that items exported and then imported to another machine will retain the item's original handle.
The -m Argument
Using the -m argument will export the item/collection and also perform the migration step. It will perform the same process that the next section Exchanging
Content Between Repositories performs. We recommend that section to be read in conjunction with this flag being used.
The -x Argument
211
Using the -x argument will do the standard export except for the bitstreams which will not be exported. If you have full SAF without bitstreams and you
have the bitstreams archive (which might have been imported into DSpace earlier) somewhere near, you could symlink original archive files into SAF
directories and have an exported collection which almost doesn't occupy any space but otherwise is identical to the exported collection (i.e. could be
imported into DSpace). In case of huge collections -x mode might be substantially faster than full export.
UI Batch Export
Available in DSpace 7.4 and above.
Batch export can also take place via the Administrator's UI. The default file size upload limit is 512MB, and is being configured in the spring boot
application.properties file.
1. Login as an Administrator.
2. In the side menu, select "Import" "Batch Export (ZIP)"
4. Clicking "Export" will start the Batch Export. This creates a new "Process" which begins export process. Depending on the size of the export, this
process may take some time to complete. You can refresh to page to see the current status, or go back to the list of processes ("Processes"
menu in sidebar) to check on its status. Once the process is COMPLETED, you will see a log of the results and an exported ZIP file which you
can download for the results.
5. All prior exports will be listed in the "Processes" menu, until their corresponding process entry is deleted. Once you are satisfied with the export
and have downloaded the ZIP, you may wish to delete that process entry in order to free up storage space (as your exported ZIP will be retained
in DSpace until the process is deleted). A "process-cleaner" script can also be started from the "Processes" page which can be used to bulk
delete old processes.
It is also possible to start an "export" directly from the "Processes" menu. This allows you to specify additional options/flags which are normally only
available from the command-line "export" tool (see documentation above). It also allows you to export a single Item.
212
Importing and Exporting Content via Packages
1 Package Importer and Exporter
1.1 Supported Package Formats
1.2 Ingesting
1.2.1 Ingestion Modes & Options
1.2.1.1 Ingesting a Single Package
1.2.1.2 Ingesting Multiple Packages at Once
1.2.2 Restoring/Replacing using Packages
1.2.2.1 Default Restore Mode
1.2.2.2 Restore, Keep Existing Mode
1.2.2.3 Force Replace Mode
1.3 Disseminating
1.3.1 Disseminating a Single Object
1.3.2 Disseminating Multiple Objects at Once
1.4 Archival Information Packages (AIPs)
1.5 METS packages
This mode also displays a list of the names of package ingestion and dissemination plugins that are currently installed in your DSpace. Each Packager
plugin also may allow for custom options, which may provide you more control over how a package is imported or exported. You can see a listing of all
specific packager options by invoking --help (or -h) with the --type (or -t) option:
The above example will display the normal help message, while also listing any additional options available to the "METS" packager plugin.
AIP - Ingests content which is in the DSpace Archival Information Package (AIP) format. This is used as part of the DSpace AIP Backup and
Restore process
DSPACE-ROLES - Ingests DSpace users/groups in the DSPACE-ROLES XML Schema. This is primarily used by the DSpace AIP Backup and
Restore process to ingest/replace DSpace Users & Groups.
METS - Ingests content which is in the DSpace METS SIP format
PDF - Ingests a single PDF file (where basic metadata is extracted from the file properties in the PDF Document).
AIP - Exports content which is in the DSpace Archival Information Package (AIP) format. This is used as part of the DSpace AIP Backup and
Restore process
DSPACE-ROLES - Exports DSpace users/groups in the DSPACE-ROLES XML Schema. This is primarily used by the DSpace AIP Backup and
Restore process to export DSpace Users & Groups.
METS - Exports content in the DSpace METS SIP format
For a list of all package ingestion and dissemination plugins that are currently installed in your DSpace, you can execute:
Some packages ingestion and dissemination plugins also have custom options/parameters. For example, to see a listing of the custom options for the
"METS" plugin, you can execute:
Ingesting
213
Ingestion Modes & Options
When ingesting packages DSpace supports several different "modes". (Please note that not all packager plugins may support all modes of ingestion)
1. Submit/Ingest Mode (-s option, default) – submit package to DSpace in order to create a new object(s)
2. Restore Mode (-r option) – restore pre-existing object(s) in DSpace based on package(s). This also attempts to restore all handles and
relationships (parent/child objects). This is a specialized type of "submit", where the object is created with a known Handle and known
relationships.
3. Replace Mode (-r -f option) – replace existing object(s) in DSpace based on package(s). This also attempts to restore all handles and
relationships (parent/child objects). This is a specialized type of "restore" where the contents of existing object(s) is replaced by the contents in
the AIP(s). By default, if a normal "restore" finds the object already exists, it will back out (i.e. rollback all changes) and report which object already
exists.
Where [user-email] is the e-mail address of the E-Person under whose authority this runs; [parent-handle] is the Handle of the Parent Object into which the
package is ingested, [packager-name] is the plugin name of the package ingester to use, and /full/path/to/package is the path to the file to ingest (or "-" to
read from the standard input).
Here is an example that loads a PDF file with internal metadata as a package:
This example takes the result of retrieving a URL and ingests it:
For a Site-based package - this would ingest all Communities, Collections & Items based on the located package files
For a Community-based package - this would ingest that Community and all SubCommunities, Collections and Items based on the located
package files
For a Collection - this would ingest that Collection and all contained Items based on the located package files
For an Item – this just ingest the Item (including all Bitstreams & Bundles) based on the package file.
for example:
The above command will ingest the package named "collection-aip.zip" as a child of the specified Parent Object (handle="4321/12"). The resulting object
is assigned a new Handle (since -s is specified). In addition, any child packages directly referenced by "collection-aip.zip" are also recursively ingested (a
new Handle is also assigned for each child AIP).
Because the packager plugin must know how to locate all child packages from an initial package file, not all plugins can support bulk ingest. Currently, in
DSpace the following Packager Plugins support bulk ingest capabilities:
214
There are currently three restore modes:
1. Default Restore Mode (-r) = Attempt to restore object (and optionally children). Rollback all changes if any object is found to already exist.
2. Restore, Keep Existing Mode (-r -k) = Attempt to restore object (and optionally children). If an object is found to already exist, skip over it (and
all children objects), and continue to restore all other non-existing objects.
3. Force Replace Mode (-r -f) = Restore an object (and optionally children) and overwrite any existing objects in DSpace. Therefore, if an object
is found to already exist in DSpace, its contents are replaced by the contents of the package. WARNING: This mode is potentially dangerous as it
will permanently destroy any object contents that do not currently exist in the package. You may want to first perform a backup, unless you are
sure you know what you are doing!
For example:
Notice that unlike -s option (for submission/ingesting), the -r option does not require the Parent Object (-p option) to be specified if it can be determined
from the package itself.
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a
child of the parent object specified within the package itself). If the object is found to already exist, all changes are rolled back (i.e. nothing is restored to
DSpace)
One special case to note: If a Collection or Community is found to already exist, its child objects are also skipped over. So, this mode will not auto-restore
items to an existing Collection.
For example:
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a
child of the parent object specified within the package itself). In addition, any child packages referenced by "aip4567.zip" are also recursively restored (the -
a option specifies to also restore all child pacakges). They are also restored with the Handles & Parent Objects provided with their package. If any object is
found to already exist, it is skipped over (child objects are also skipped). All non-existing objects are restored.
Because this mode actually destroys existing content in DSpace, it is potentially dangerous and may result in data loss! It is recommended to always
perform a full backup (assetstore files & database) before attempting to replace any existing object(s) in DSpace.
For example:
215
[dspace]/bin/dspace packager -r -f -t AIP -e [email protected] aip4567.zip
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a
child of the parent object specified within the package itself). In addition, any child packages referenced by "aip4567.zip" are also recursively ingested.
They are also restored with the Handles & Parent Objects provided with their package. If any object is found to already exist, its contents are replaced by
the contents of the appropriate package.
If any error occurs, the script attempts to rollback the entire replacement process.
Disseminating
Where [user-email] is the e-mail address of the E-Person under whose authority this runs; [handle] is the Handle of the Object to disseminate; [packager-
name] is the plugin name of the package disseminator to use; and [file-path] is the path to the file to create (or "-" to write to the standard output). For
example:
The above code will export the object of the given handle (4321/4567) into a METS file named "4567.zip".
for example:
The above code will export the object of the given handle (4321/4567) into a METS file named "4567.zip". In addition it would export all children objects to
the same directory as the "4567.zip" file.
This feature came out of a requirement for DSpace to better integrate with DuraCloud, and other backup storage systems. One of these requirements is to
be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.
Essentially, this means DSpace can export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a
relatively standard format (a METS-based, AIP format). This entire hierarchy can also be re-imported into DSpace in the same format (essentially a restore
of that content in the same or different DSpace installation).
For more information, see the section on AIP backup & Restore for DSpace.
METS packages
Since DSpace 1.4 release, the software includes a package disseminator and matching ingester for the DSpace METS SIP (Submission Information
Package) format. They were created to help end users prepare sets of digital resources and metadata for submission to the archive using well-defined
standards such as METS, MODS, and PREMIS. The plugin name is METS by default, and it uses MODS for descriptive metadata.
216
Configurable Workflow
1 Introduction
2 How to Configure your Workflows
2.1 WORKFLOWS
2.2 STEPS
2.3 ROLES
2.4 ACTIONS
2.5 CURATION
2.6 HOW IT WORKS
3 Data Migration
3.1 Workflowitem conversion/migration scripts
3.1.1 Automatic migration
3.1.2 Java based migration
4 Configuration
4.1 Main workflow configuration
4.1.1 workflowFactory bean (org.dspace.xmlworkflow.XmlWorkflowFactoryImpl)
4.1.2 workflow beans (org.dspace.xmlworkflow.state.Workflow)
4.1.3 role beans (org.dspace.xmlworkflow.Role)
4.1.4 step beans (org.dspace.xmlworkflow.state.Step)
4.2 Workflow actions configuration
4.2.1 API configuration
4.2.1.1 User Selection Action
4.2.1.2 Processing Action
5 Authorizations
6 Database
6.1 cwf_workflowitem
6.2 cwf_collectionrole
6.3 cwf_workflowitemrole
6.4 cwf_pooltask
6.5 cwf_claimtask
6.6 cwf_in_progress_user
7 Additional workflow steps/actions and features
7.1 Optional workflow steps: Select single reviewer workflow
7.2 Optional workflow steps: Score review workflow
7.3 Workflow overview features
Introduction
Workflows can be used to define how documents should be reviewed or edited after being submitted and/or imported into DSpace. The primary focus of
the workflow framework is to create a more flexible solution for the administrator to configure, and even to allow an application developer to implement
custom steps, which may be configured in the workflow for the collection through a simple configuration file. Each workflow can be compared to an action
that is performed on an item between its submission to the repository and the moment it is archived and published in the repository. The concept behind
this approach was modeled on the configurable submission system already present in DSpace.
Each collection is associated with a workflow. If no explicit association is made, the collection is assigned the default workflow. These associations are
configured in config/spring/api/workflow.xml using the workflowMapping property of the XmlWorkflowFactory bean. To make an explicit
association, add an entry to the list with the collection's Handle as the 'key' and the 'name' of a Workflow bean as the 'value-ref'.
Each step in a workflow is associated with a "role" which defines who can perform that step. Role members will be notified when a new submission needs
their attention. Roles are defined by DSpace user groups. If you wish to have reviewers interact with incoming submissions, you must create and fill the
necessary groups. See below for details.
WORKFLOWS
To create a new workflow, add another bean with the 'class' 'org.dspace.xmlworkflow.state.Workflow' and a unique 'name'. Give it a 'steps' property
containing a list of the steps that should be entered in sequence, and a 'firstStep' property which names the step to be entered first. See the default
workflow for an example. An existing step may be re-used if appropriate, or you can create one to suit.
STEPS
Aside from its name, a step has a "user selection method", a "role", "actions" and "outcomes".
A step's 'userSelectionMethod' is the name of an "action" of the user-selection type. A step may, for example, let itself be claimed (for a given submission)
by a single user, or it may combine the actions of multiple users. A step has exactly one 'userSelectionMethod'. See more on actions below.
A step's role defines the set of users who may perform actions on a submission that has entered that step. See more on roles below.
217
A step's actions are the types of work that are done in the step. See more on actions below. More than one action may be listed.
A step's outcomes connect the role members' decisions with the next step to be performed. For example, this allows a role member to accept a
submission and skip subsequent steps by going directly to the final step in the workflow.
To create a new step, add a bean with 'class' org.dspace.xmlworkflow.state.Step and the necessary properties, as discussed above. See the existing
steps in workflow.xml for examples.
ROLES
You may re-use existing roles, or add your own. A role has a 'name', a 'scope', and optionally a 'description'. There are three kinds of roles:
A COLLECTION role refers to a user group associated with a specific collection. It will be named {collectionID}_{roleName}. For example, a role
'editor' with COLLECTION scope, applied to collection 123, will refer to the user group named 'editor_123', while the same role applied to
collection 456 will refer to the user group 'editor_456'.
A REPOSITORY role refers to a fixed user group, whose name is the role's name. A REPOSITORY role named 'fred' will always refer to the user
group 'fred'.
An ITEM role is assigned by a previous action in the workflow. [NEEDS MORE EXPLANATION]
To create a new role, add a bean with 'class' org.dspace.xmlworkflow.Role, the appropriate 'scope', and a unique 'name'. Be sure that the related groups
exist.
ACTIONS
Actions are defined separately in 'config/spring/api/workflow-actions.xml'.
A number of actions are already defined, and these should serve most needs. Actions are implemented in Java code, so if you need a new one then you
will need to write some Java in addition to configuring it here.
There are two kinds of actions: user assignment and processing. A user assignment action selects one or more role members to execute a step. A
processing action modifies the state of the submission.
To configure a new Action, create a bean with a unique 'id', 'class' equal to the fully qualified name of the Java class which implements the action, and
'scope' "prototype". Add properties, constructor arguments, etc. as required by the code.
CURATION
To attach a Curation Task to a workflow step, see Curation System. Tasks are executed at the beginning of a step, before role members are notified.
HOW IT WORKS
For details of how these concepts are implemented (for example, to create new actions) see the Workflow page under DSpace Development.
Data Migration
As of DSpace 7, Configurable Workflow is the only workflow system available in DSpace. It has fully replaced the older "traditional/basic workflow"
system. One major difference is that Configurable Workflow is dynamic – if a user is added to a workflow approval task after a workflow has already begun,
they will immediately get access to any existing items in workflow. Previously, this was not possible in the "traditional" workflow system.
Automatic migration
As part of the upgrade to DSpace 7 or above, all your old policies, roles, tasks and workflowitems will be automatically updated from the original workflow
to the Configurable Workflow framework. This is done via this command:
The "ignored" parameter will tell DSpace to run any previously-ignored migrations on your database. As the Configurable Workflow migrations have
existed in the DSpace codebase for some time, this is the only way to force them to be run.
For more information on the "database migrate" command, please see Database Utilities.
218
In case your DSpace installation uses a customized version of the workflow, the migration script might not work properly and a different approach is
recommended. Therefore, an additional Java based script has been created that restarts the workflow for all the workflowitems that exist in the original
workflow framework. The script will take all the existing workflowitems and place them in the first step of the configurable workflow framework thereby
taking into account the XML configuration that exists at that time for the collection to which the item has been submitted. This script can also be used to
restart the workflow for workflowitems in the original workflow but not to restart the workflow for items in the configurable workflow.
Configuration
The workflow main configuration can be found in the workflow.xml file, located in [dspace]/config/spring/api/workflow.xml . An example of
this workflow configuration file can be found below.
<beans>
<bean class="org.dspace.xmlworkflow.XmlWorkflowFactoryImpl">
<property name="workflowMapping">
<util:map>
<entry key="defaultWorkflow" value-ref="defaultWorkflow"/>
<!-- <entry key="123456789/4" value-ref="selectSingleReviewer"/>-->
<!-- <entry key="123456789/5" value-ref="scoreReview"/>-->
</util:map>
</property>
</bean>
<bean id="{workflow.id}"
class="org.dspace.xmlworkflow.state.Workflow">
<!-- Another workflow configuration-->
</bean>
</beans>
219
key: can either be a collection handle or "defaultWorkflow"
value-ref: the value of this attribute points to one of the workflow configurations defined by the "Workflow" beans
"name" attribute: a unique name used for the identification of the workflow and used in the workflow to collection mapping
"firstStep" property: the identifier of the first step of the workflow. This step will be the entry point of this workflow-process. When a new item
has been committed to a collection that uses this workflow, the step configured in the "firstStep" property will he the first step the item will go
through.
"steps" property: a list of all steps within this workflow (in the order they will be processed).
"id" attribute: a unique identifier (in one workflow process) for the role
"description" property: optional attribute to describe the role
"scope" property: optional attribute that is used to find our group and must have one of the following values, which are defined as constant fields
of org.dspace.xmlworkflow.Role.Scope:
COLLECTION: The collection value specifies that the group will be configured at the level of the collection. This type of groups is the
same as the type that existed in the original workflow system. In case no value is specified for the scope attribute, the workflow
framework assumes the role is a collection role.
REPOSITORY: The repository scope uses groups that are defined at repository level in DSpace. The name attribute should exactly
match the name of a group in DSpace.
ITEM: The item scope assumes that a different action in the workflow will assign a number of EPersons or Groups to a specific workflow-
item in order to perform a step. These assignees can be different for each workflow item.
"name" property: The name specified in the name attribute of a role will be used to lookup an eperson group in DSpace. The lookup will depend
on the scope specified in the "scope" attribute:
COLLECTION: The workflow framework will look for a group containing the name specified in the name attribute and the ID of the
collection for which this role is used.
REPOSITORY: The workflow framework will look for a group with the same name as the name specified in the name attribute.
ITEM: in case the item scope is selected, the name of the role attribute is not required.
"name" attribute: The name attribute specifies a unique identifier for the step. This identifier will be used when configuring other steps in order to
point to this step. This identifier can also be used when configuring the start step of the workflow item.
"userSelectionMethod" property: This attribute defines the UserSelectionAction that will be used to determine how to attach users to this
step for a workflow-item. The value of this attribute must refer to the identifier of an action bean in the workflow-actions.xml. Examples of the user
attachment to a step are the currently used system of a task pool or as an alternative directly assigning a user to a task.
"role" property: optional attribute that must point to the id attribute of a role element specified for the workflow. This role will be used to define
the epersons and groups used by the userSelectionMethod.
RequiredUsers
220
<bean name="reviewstep" class="org.dspace.xmlworkflow.state.Step">
<property name="userSelectionMethod" ref="claimaction"/>
<property name="role" ref="reviewer"/>
<property name="outcomes">
<util:map>
<entry key="#{ T(org.dspace.xmlworkflow.state.actions.ActionResult).OUTCOME_COMPLETE}"
value-ref="editstep"/>
</util:map>
</property>
<property name="actions">
<util:list>
<ref bean="reviewaction"/>
</util:list>
</property>
</bean>
Each step contains a number of actions that the workflow item will go through. In case the action has a user interface, the users responsible for the
exectution of this step will have to execute these actions before the workflow item can proceed to the next action or the end of the step.
There is also an optional subsection that can be defined for a step part called "outcomes". This can be used to define outcomes for the step that differ
from the one specified in the nextStep attribute. Each action returns an integer depending on the result of the action. The default value is "0" and will make
the workflow item proceed to the next action or to the end of the step.
In case an action returns a different outcome than the default "0", the alternative outcomes will be used to lookup the next step. The "outcomes" element
contains a number of steps, each having a status attribute. This status attribute defines the return value of an action. The value of the element will be used
to lookup the next step the workflow item will go through in case an action returns that specified status.
API configuration
The workflow actions configuration is located in the [dspace]/config/spring/api/ directory and is named "workflow-actions.xml". This
configuration file describes the different Action Java classes that are used by the workflow framework. Because the workflow framework uses Spring
framework for loading these action classes, this configuration file contains Spring configuration.
This file contains the beans for the actions and user selection methods referred to in the workflow.xml. In order for the workflow framework to work
properly, each of the required actions must be part of this configuration.
221
<beans
xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans
/spring-beans-2.0.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util
/spring-util-2.0.xsd">
<!-- Below the class identifiers come the declarations for out actions/userSelectionMethods -->
User selection action: This type of action is always the first action of a step and is responsible for the user selection process of that step. In case a
step has no role attached, no user will be selected and the NoUserSelectionAction is used.
Processing action: This type of action is used for the actual processing of a step. Processing actions contain the logic required to execute the
required operations in each step. Multiple processing actions can be defined in one step. These user and the workflow item will go through these
actions in the order they are specified in the workflow configuration unless an alternative outcome is returned by one of them.
This bean defines a new UserSelectionActionConfig and the following child tags:
constructor-arg: This is a constructor argument containing the ID of the task. This is the same as the id attribute of the bean and is used by the
workflow configuration to refer to this action.
property processingAction: This tag refers the the ID of the API bean, responsible for the implementation of the API side of this action. This
bean should also be configured in this XML.
property requiresUI: In case this property is true, the workflow framework will expect a user interface for the action. Otherwise the framework
will automatically execute the action and proceed to the next one.
Processing Action
Processing actions are configured similarly to the user selection actions. The only difference is that these processing action beans are implementations of
the WorkflowActionConfig class instead of the UserSelectionActionConfig class.
222
Authorizations
Currently, the authorizations are always granted and revoked based on the tasks that are available for certain users and groups. The types of authorization
policies that is granted for each of these is always the same:
READ
WRITE
ADD
DELETE
Database
The workflow uses a separate metadata schema named workflow. The fields this schema contains can be found in the [dspace]/config
/registries directory and in the file workflow-types.xml. This schema is only used when using the score reviewing system at the moment, but one
could always use this schema if metadata is required for custom workflow steps.
The following tables have been added to the DSpace database. All tables are prefixed with 'cwf_' to avoid any confusion with the existing workflow related
database tables:
cwf_workflowitem
The cwf_workflowitem table contains the different workflowitems in the workflow. This table has the following columns:
workflowitem_id: The identifier of the workflowitem and primary key of this table
item_id: The identifier of the DSpace item to which this workflowitem refers.
collection_id: The collection to which this workflowitem is submitted.
multiple_titles: Specifies whether the submission has multiple titles (important for submission steps)
published_before: Specifies whether the submission has been published before (important for submission steps)
multiple_files: Specifies whether the submission has multiple files attached (important for submission steps)
cwf_collectionrole
The cwf_collectionrole table represents a workflow role for one collection. This type of role is the same as the roles that existed in the original workflow
meaning that for each collection a separate group is defined to described the role. The cwf_collectionrole table has the following columns:
collectionrol_id: The identifier of the collectionrole and the primaty key of this table
role_id: The identifier/name used by the workflow configuration to refer to the collectionrole
collection_id: The collection identifier for which this collectionrole has been defined
group_id: The group identifier of the group that defines the collection role
cwf_workflowitemrole
The cwf_workflowitemrole table represents roles that are defined at the level of an item. These roles are temporary roles and only exist during the
execution of the workflow for that specific item. Once the item is archived, the workflowitemrole is deleted. Multiple rows can exist for one workflowitem
with e.g. one row containing a group and a few containing epersons. All these rows together make up the workflowitemrole The cwf_workflowitemrole table
has the following columns:
workflowitemrole_id: The identifier of the workflowitemrole and the primaty key of this table
role_id: The identifier/name used by the workflow configuration to refer to the workflowitemrole
workflowitem_id: The cwf_workflowitem identifier for which this workflowitemrole has been defined
group_id: The group identifier of the group that defines the workflowitemrole role
eperson_id: The eperson identifier of the eperson that defines the workflowitemrole role
cwf_pooltask
The cwf_pooltask table represents the different task pools that exist for a workflowitem. These task pools can be available at the beginning of a step and
contain all the users that are allowed to claim a task in this step. Multiple rows can exist for one task pool containing multiple groups and epersons. The
cwf_pooltask table has the following columns:
pooltask_id: The identifier of the pooltask and the primaty key of this table
workflowitem_id: The identifier of the workflowitem for which this task pool exists
workflow_id: The identifier of the workflow configuration used for this workflowitem
step_id: The identifier of the step for which this task pool was created
action_id: The identifier of the action that needs to be displayed/executed when the user selects the task from the task pool
eperson_id: The identifier of an eperson that is part of the task pool
group_id: The identifier of a group that is part of the task pool
cwf_claimtask
The cwf_claimtask table represents a task that has been claimed by a user. Claimed tasks can be assigned to users or can be the result of a claim from
the task pool. Because a step can contain multiple actions, the claimed task defines the action at which the user has arrived in a particular step. This
makes it possible to stop working halfway the step and continue later. The cwf_claimtask table contains the following columns:
223
claimtask_id: The identifier of the claimtask and the primary key of this table
workflowitem_id: The identifier of the workflowitem for which this task exists
workflow_id: The id of the workflow configuration that was used for this workflowitem
step_id: The step that is currenlty processing the workflowitem
action_id: The action that should be executed by the owner of this claimtask
owner_id: References the eperson that is responsible for the execution of this task
cwf_in_progress_user
The cwf_in_progess_user table keeps track of the different users that are performing a certain step. This table is used because some steps might require
multiple users to perform the step before the workflowitem can proceed. The cwf_in_progress_user table contains the following columns:
in_progress_user_id: The identifier of the in progress user and the primary key of this table
workflowitem_id: The identifier of the workflowitem for which the user is performing or has performed the step.
user_id: The identifier of the eperson that is performing or has performe the task
finished: Keeps track of the fact that the user has finished the step or is still in progress of the execution
These optional workflow steps are pre-defined in the "workflow.xml" but are not used by default.
selectReviewerStep: During this step, a user has the ability to select a responsible user to review the workflowitem. This means that for each
workflowitem, a different user can be selected. Because a user is assigned, the task pool is no longer required.
The available users to select from are defined in the "action.selectrevieweraction.group" setting in workflow.cfg. This setting must list the
name of a group of reviewers to select from (default value = "Reviewers" group).
singleUserReviewStep: The start of the reviewstep is different than the typical task pool. Instead of having a task pool, the user will be
automatically assigned to the task. However, the user still has the option to reject the task (in case he or she is not responsible for the assigned
task) or review the item. In case the user rejects the task, the workflowitem will be sent to the another step in the workflow as an alternative to the
default outcome.
In workflow.cfg, there's an option to allow these reviewers to be able to edit files & metadata. When "workflow.reviewer.file-edit=true",
the selected user will be allowed to edit the workflow item. By default they cannot.
scoreReviewStep: The group of responsible users for the score reviewing will be able to claim the task from the taskpool. Depending on the
configuration, a different number of users can be required to execute the task (default is requiredusers=2). This means that the task will be
available in the task pool until the required number of users has at least claimed the task. Once everyone of them has finished the task, the next
(automatic) processing step is activated.
In workflow.cfg, there's an option to allow these reviewers to be able to edit files & metadata. When "workflow.reviewer.file-edit=true",
the selected user will be allowed to edit the workflow item. By default they cannot.
evaluationStep: During the evaluationstep, no user interface is required. The workflow system will automatically execute the step that evaluates
the different scores (which corresponds to a rating from 1-5). In case the average score is greater than the average "minimumAcceptanceScore",
the item is approved, otherwise it is rejected. (The minimum average score is set by adjusting the minimumAcceptanceScore property passed
to evaluationactionAPI in config/spring/api/workflow-actions.xml.)
224
Submission User Interface
This page explains various customization and configuration options that are available within DSpace for the Item Submission user interface.
The name and structure of the Submission configuration files changed in 7.x. The DSpace 6.x (and below) "item-submission.xml" and "input-forms.xml"
configuration files are no longer supported. In 7.x and above, the format of the "item-submission.xml" file has been updated, and the older "input-forms.
xml" has been replaced by a new "submission-forms.xml".
You can choose to either start fresh with the new v7 configuration files (see documentation below) and/or use the "./dspace submission-forms-migrate"
script to migrate your old configurations into new ones. See the Upgrading DSpace guide (step on "Update your DSpace Configurations") for more
information on using the migration script.
1. "Select Collection" (id="collection"), appears as dropdown: If not already selected, the user must select a collection to deposit the Item into. As
of DSpace 7, you also can change the Collection you are submitting into at any time. However, be aware that there may be some metadata lost if
the Collection you switch two uses a different submission form & you already began entering metadata in the current submission.
2. "Describe" sections (id="traditionalpageone" and "traditionalpagetwo"): This is where the user may enter descriptive metadata about the Item.
This step may consist of one or more sections of metadata entry. By default, there are two sections of metadata-entry . For information on
modifying the metadata entry pages, please see Custom Metadata-entry Pages for Submission section below.
3. "Upload" section (id="upload"): This is where the user may upload one or more files to associate with the Item. As of DSpace 7, you can also
drag and drop files anywhere on the page to trigger an upload. For more information on file upload, also see Configuring the File Upload step
below.
4. "License" section (id="license"): This is where the user must agree to the repository distribution license in order to complete the deposit. This
repository distribution license is defined in the [dspace]/config/default.license file. It can also be customized per-collection from the
Collection Edit UI.
5. "Deposit" button: Once all required fields/sections are completed, the "Deposit" button becomes enabled. After clicking it, the new Item will
either become immediately available or undergo a workflow approval process (depending on the Collection policies). For more information on the
workflow approval process see Configurable Workflow
225
To modify or reorganize these submission steps, just modify the [dspace]/config/item-submission.xml file. Please see the section below on Reor
dering/Removing/Adding Submission Steps.
You can also choose to have different submission processes for different DSpace Collections. For more details, please see the section below on Assigning
a custom Submission Process to a Collection.
Optional Steps
DSpace also ships with several optional steps which you may choose to enable if you wish. In no particular order:
"Item Access" (or Embargo) section (id="itemAccessConditions"): Only available in 7.2 or above. This step allows the user to (optionally)
modify access rights or set an embargo during the deposit of an Item. For more information on this step, and Embargo options in general, please
see the Embargo documentation.
"CC License" section (id="cclicense"): This step allows the user to (optionally) assign a Creative Commons license to a particular Item. Please
see the Configuring Creative Commons License section of the Configuration documentation for more details.
"Extraction" section (id="extractionstep"): This step will automatically attempt to extract metadata from uploaded files. Currently it only supports
bibliographic formats documented in Importing Items via basic bibliographic formats (Endnote, BibTex, RIS, TSV, CSV) and online services (OAI,
arXiv, PubMed, CrossRef, CiNii). Any extracted metadata is immediately populated in the submission form (without notifying the user).
By default it is disabled, as it populates metadata automatically (without notifying the user). This means it can sometimes result in
duplicative metadata in the submission form.
The behavior of this step can be more fully configured via the 'config/spring/api/step-processing-listener.xml' configuration
NOTE: this action is also only triggered when a request is performed (e.g. when a file is uploaded or the submission form is saved). You
can configure the Angular UI to autosave based on a timer in order to force this action to be done more regularly.
Various Configurable Entities related steps: These steps are "Describe" steps that are specific to different Entity types. They provide a list of
metadata fields of specific interest to those Entities.
To enable any of these optional submission steps, just uncomment the step definition within the [dspace]/config/item-submission.xml file.
Please see the section below on Reordering/Removing/Adding Submission Steps.
You can also choose to enable certain steps only for specific DSpace Collections. For more details, please see the section below on Assigning a custom
Submission Process to a Collection.
Step definitions under <step-definitions> now use the <step-definition> tag (previously, in 6.x, the tag was named <step>)
Every step definition now needs to be defined under <step-definitions> (previously, in 6.x, you could also define them in <submission-process>),
and have a unique ID
Each <step-definition> now only represents a single "section" of the Submission UI. (previously, in 6.x, some steps like Describe represented
multiple pages)
An attribute "mandatory=[true|false]" was added to the <step> element. When true, that section is always displayed to the user. When false, it's
not displayed by default, but instead must be activated explicitly by the user by choosing to add the section in the Submission UI.
The old <workflow-editable> element has been replaced with a <scope> element which defines when/how this <step> should be displayed.
Because this file is in XML format, you should be familiar with XML before editing this file. By default, this file contains the "traditional" Item Submission
Process for DSpace, which consists of the following Steps (in this order):
Select Collection -> Describe (two steps) -> Upload -> License -> Complete
If you would like to customize the steps used or the ordering of the steps, you can do so within the <submission-definition> section of the item-submission.
xml .
In addition, you may also specify different Submission Processes for different DSpace Collections. This can be done in the <submission-map> section. The
item-submission.xml file itself documents the syntax required to perform these configuration changes.
This section allows all <step> definitions to be defined globally (i.e. so they may be used in multiple <submission-process> definitions). Steps
defined in this section must define a unique id which can be used to reference this step.
For example:
226
<step-definitions>
<step-definition id="custom-step">
...
</step>
...
</step-definitions>
The above step definition could then be referenced from within a <submission-process> as simply <step id="custom-step"/>
For example, the following defines a Submission Process where the License step directly precedes the Describe step (more information about the
structure of the information under each <step> tag can be found in the section on Structure of the <step> Definition below):
<submission-process>
<!--Step 1 will be to Sign off on the License-->
<step id="license"/>
...[other steps]...
</submission-process>
Each step contains the following elements/attributes. The required elements are so marked:
mandatory (attribute): [true|false] When true, the step's section is displayed by default to all users in the UI. When false, the step is not displayed
and must be activated explicitly by the user by selecting it in the UI or supplying data of interest to the section.
heading: Partial I18N key (defined in the UI's language packs) which corresponds to the text that should be displayed in section header for this
step. This partial I18N key is prefixed with "submission.sections.". Therefore, the full i18n key is "submission.sections.[heading]" in the User
Interface's language packs (e.g. en.json5 for English)
processing-class (Required): Full Java path to the Processing Class for this Step. This Processing Class must perform the primary processing
of any information gathered in this step. All valid step processing classes must extend the abstract org.dspace.submit.
AbstractProcessingStep class (or alternatively, extend one of the pre-existing step processing classes in org.dspace.submit.step.*)
type (Required): The type of step defined. Most steps are of type "submission-form", which means they directly map to a <form> defined in the s
ubmission-forms.xml configuration file. In this situation, the <step-definition> "id" attribute MUST map to a <form> "name" attribute
defined in submission-forms.xml. Any value is allowed, and only "submission-form" has a special meaning at this time.
scope: Optionally, allows you to limit the "scope" of this particular step, and define whether the step is visible outside that scope. Valid scope
values include "submission" (limited to the submission form) and "workflow" (limited to workflow approval process).
"visibility" attribute defines the visibility of the step while within the given scope. Can be set to "read-only" (in this scope you can see this
step but not edit it), or "hidden" (in this scope you cannot see this step).
"visibilityOutside" attribute defines the visibility of the step while outside the given scope. Can be set to "read-only" (in other scopes you
can see this step but not edit it), or "hidden" (in other scopes you cannot see this step).
Reordering steps
1. Locate the <submission-process> tag which defines the Submission Process that you are using. If you are unsure which Submission Process
you are using, it's likely the one with name="traditional", since this is the traditional DSpace submission process.
2. Reorder the <step> tags within that <submission-process> tag. Be sure to move the entire <step> tag.
227
Removing one or more steps
1. Locate the <submission-process> tag which defines the Submission Process that you are using. If you are unsure which Submission Process
you are using, it's likely the one with name="traditional", since this is the traditional DSpace submission process.
2. Comment out (i.e. surround with <! -- and -->) the <step> tags which you want to remove from that <submission-process> tag. Be sure to
comment out the entire <step > tag.
Hint: You cannot remove the "collection" step, as an DSpace Item cannot exist without belonging to a Collection.
1. Locate the <submission-process> tag which defines the Submission Process that you are using. If you are unsure which Submission Process
you are using, it's likely the one with name="traditional", since this is the traditional DSpace submission process.
2. Uncomment (i.e. remove the <! -- and -->) the <step> tag(s) which you want to add to that <submission-process> tag. Be sure to
uncomment the entire <step> tag.
Each name-map element within submission-map associates a collection with the name of a submission definition.
1. The traditional way is to use the "collection-handle" attribute to map a submission form to it's Collection. Its collection-handle attribute is the
Handle of the collection. Its submission-name attribute is the submission definition name, which must match the name attribute of a submission-
process element (in the submission-definitions section of item-submission.xml.
a. For example, the following fragment shows how the collection with handle "12345.6789/42" is assigned the "custom" submission process:
<submission-map>
<name-map collection-handle="12345.6789/42" submission-name="custom" />
...
</submission-map>
<submission-definitions>
<submission-process name="custom">
...
</submission-definitions>
2. As of 7.6, another option is to use the "collection-entity-type" attribute to map all Collections which use that Entity Type (requires Configurable
Entities) to a specific submission definition name (via the submission-name attribute, similar to above).
a. For example, the following fragment shows how to map all Collections which use the out-of-the-box Entity Types to a submission
definition of the same name:
<submission-map>
...
<name-map collection-entity-type="Publication" submission-name="Publication"/>
<name-map collection-entity-type="Person" submission-name="Person"/>
<name-map collection-entity-type="Project" submission-name="Project"/>
<name-map collection-entity-type="OrgUnit" submission-name="OrgUnit"/>
<name-map collection-entity-type="Journal" submission-name="Journal"/>
<name-map collection-entity-type="JournalVolume" submission-name="JournalVolume"/>
<name-map collection-entity-type="JournalIssue" submission-name="JournalIssue"/>
...
<submission-map>
It's a good idea to keep the definition of the default name-map, so there is always a default for collections which do not have a custom form set.
http://myhost.my.edu/handle/12345.6789/42
The handle is everything after "handle/" (in the above example it is "12345.6789/42"). It should look familiar to any DSpace administrator. That is what
goes in the collection-handle attribute of your name-map element.
228
Assigning a default Submission Process per Entity Type
Alternatively to a collection's Handle, Entities Types can be used as an attribute. With these configurations you will enable default submission forms per
Entity type. You don't have to specify every collection's handle to use for a particular submission form if you intend to use entities.
In order to do it so, instead of collection-handle attribute you need to use collection-entity-type. The possible values for this attribute are the
ones that you use or that you specified in relationship-types.xml file (please check the documentation for more information). In order the
submission process to be assigned to an entity type, you need to previously have associated an Entity Type to a Collection (please check: Configurable
Entities#3.ConfigureCollectionsforeachEntitytype).
As an example, for every time you need to insert a new person in a Person's collection. You just need to specify the submission form to be used, like: subm
ission-name="customPerson" in the example and also the entity type that is associated, like collection-entity-type="Person".
<submission-map>
<name-map collection-entity-type="Person" submission-name="customPerson" />
...
</submission-map>
<submission-definitions>
<submission-process name="customPerson">
...
</submission-definitions>
If a collection collection-handle="12345.6789/42" configuration will prevail over this configuration. Meaning that if a collection-entity-type i
s defined and a collection-handle is also defined and if a collection handle overlaps in both configurations, then, the submission to be considered it will be
the one that is defined by collection-handle (it will prevail the one with more granularity).
Introduction
This section explains how to customize the Web forms used by submitters and editors to enter and modify the metadata for a new item. These metadata
web forms are controlled by the Describe step within the Submission Process. However, they are also configurable via their own XML configuration file [ds
pace]/config/submission-forms.xml.
In this configuration you can create alternate metadata forms, which can then be mapped to a "submission-form" step in the "item-submission.xml" (see
above).
Which fields appear on each form, and their sequence. (Keep in mind, each "form" represents to a "step" or section)
Labels, prompts, and other text associated with each field.
Ability to display smaller fields side-by-side in a single "row"
List of available choices for each menu-driven field.
All of the custom metadata-entry forms for a DSpace instance are controlled by a single XML file, submission-forms.xml, in the config subdirectory
under the DSpace home, [dspace]/config/submission-forms.xml. DSpace comes with a number of sample forms which implement the traditional
metadata-entry forms, and also serves as a well-documented example. Some default forms include:
"bitstream-metadata" - This is a special form which defines the metadata fields available for every uploaded bitstream (file)
"traditionalpageone" - A sample form which is used by the first "Describe" step defined in item-submission.xml
"traditionalpagetwo" - A sample form which is used by the second "Describe" step defined in item-submission.xml
A number of sample forms for various out-of-the-box Configurable Entities. These forms all have a corresponding <step> defined in item-
submission.xml. In conjunction to those <step> definitions, these forms may be used to submit new Entities of specific types. Usually this is done
by mapping that Entity-specific submission-process (in item-submission.xml) to a Collection which is used for new submissions of that Entity.
The rest of this section explains how to create your own sets of custom forms.
229
The name & structure of this file changed slightly in DSpace 7
The XML configuration file has a single top-level element, input-forms, which contains two elements in a specific order. The outline is as follows:
<input-forms>
1. In "item-submission.xml", a <step-definition> of type "submission-form" must be created, with an "id" matching the name of the form (see above
for more details on step-definition)
2. In "item-submission.xml", a <submission-process> must be created/updated to use that newly defined "step".
3. Finally, also in "item-submission.xml", a Collection must be setup to use that submission process in the <submission-map> section.
So, if you modify submission-forms.xml, you may need to double check your changes will be used in your item-submission.xml.
Adding a Form
You can add a new form by creating a new form element within the form-definitions element. It has one attribute, name, which as described above must
match the "id" of a <step-definition> in "item-submission.xml".
A form may contain any number of rows. A row generally only contains one or two input fields (including more than one input field may require the "style"
setting, see below). Each field defines an interactive dialog where the submitter enters one of the Dublin Core metadata items.
Composition of a Field
Each field contains the following elements, in the order indicated. The required sub-elements are so marked:
dc-schema (Required) : Name of metadata schema employed, e.g. dc for Dublin Core. This value must match the value of the schema element
defined in dublin-core-types.xml
dc-element (Required) : Name of the Dublin Core element entered in this field, e.g. contributor.
dc-qualifier: Qualifier of the Dublin Core element entered in this field, e.g. when the field is contributor.advisor the value of this element would be
advisor. Leaving this out means the input is for an unqualified DC element.
language: If set to true a drop down menu will be shown, containing languages. The selected language will be used as language tag of the
metadata field. A compulsory argument value-pairs-name must be given containing the name of the value pair that contains all the languages: e.
g. <language value-pairs-name="common_iso_languages">true</language>.
repeatable: Value is true when multiple values of this field are allowed, false otherwise. When you mark a field repeatable, the UI will add an
"Add more" control to the field, allowing the user to ask for more fields to enter additional values. Intended to be used for arbitrarily-repeating
fields such as subject keywords, when it is impossible to know in advance how many input boxes to provide. Repeatable fields also support
reordering of values.
label (Required): Text to display as the label of this field, describing what to enter, e.g. "Your Advisor's Name".
230
input-type (Required): Defines the kind of interactive widget to put in the form to collect the Dublin Core value. Content must be one of the
following keywords:
onebox – A single text-entry box (i.e. a normal input textbox)
textarea – Large block of text that can be entered on multiple lines, e.g. for an abstract.
name – Personal name, with separate fields for family name and first name. When saved they are appended in the format 'LastName,
FirstName'. (By default, this input type is unused. Author fields now use the "onebox" type to support different types of names.)
date – Calendar date. When required, demands that at least the year be entered.
series – Series/Report name and number. Separate fields are provided for series name and series number, but they are appended (with
a semicolon between) when saved.
dropdown – Choose value(s) from a "drop-down" menu list.
Requires that you include a value for the value-pairs-name attribute to specify a list of menu entries from which to choose. Use
this to make a choice from a restricted set of options, such as for the language item.
qualdrop_value – Enter a "qualified value", which includes both a qualifier from a drop-down menu and a free-text value. Used to enter
items like alternate identifiers and codes for a submitted item, e.g. the DC identifier field.
Similar to the dropdown type, requires that you include the value-pairs-name attribute to specify a menu choice list.
Because the "qualdrop_value" dynamically sets the qualifier (based on the drop-down menu), the <dc-qualifier> field MUST be
empty. The <dc-qualifier> element cannot be used with this field type.
list – Choose value(s) from a checkbox or radio button list. If the repeatable attribute is set to true, a list of checkboxes is displayed. If
the repeatable attribute is set to false, a list of radio buttons is displayed. (By default, this input type is unused.)
Requires that you include a value for the value-pairs-name attribute to specify a list of values from which to choose.
tag - A free-text field which allows you to add multiple labels/tags as values. An example is the "Subject Keywords" field.
Note: A tag field MUST be marked as <repeatable>true</repeatable>.
hint (Required): Content is the text that will appear as a "hint", or instructions, below the input field. Can be left empty, but the tag must be
present.
required: When this element is included with any content, it marks the field as a required input. If the user saves the form without entering a value
for this field, that text is displayed as a warning message. For example, <required>You must enter a title.</required> Note that leaving the
required element empty will not mark a field as required, e.g.:<required></required>
vocabulary: When specified, this field uses a controlled vocabulary defined in [dspace]/config/controlled-vocabularies/[name].
xml. This setting may be used to provide auto-complete functionality, for example in the "Subject Keywords" field (which uses the "tag" input
type). See also the "Configuring Controlled Vocabularies" section below.
regex: When specified, this field will be validated against the Regular Expression, and only successfully validating values will be saved. An
example is commented out in the default "Author" field. If the validation fails, the following error message will be shown by default: "This input is
restricted by the current pattern: {{ pattern }}.". This can be customized, by adding an entry to the internalization files with the key error.
validation.pattern.schema_element_qualifier and the schema, element and qualifier of the field. For example: "error.validation.
pattern.dc_identifier": "The identifier can only consist of numbers". For instructions on how to add custom entries see: Customize UI labels using
Internationalization (i18n) files
style: When specified, this provides a CSS style recommendation to the UI for how to style that field. This is primarily used when displaying
multiple fields per row, so that you can tell the UI how many columns each field should use in that row. Keep in mind, these styles should follow
the Bootstrap Grid System, where the number of columns adds up to 12. An example can be see in the default "Date of Issue" and "Publisher"
fields, which are configured to use 4 (col-sm-4) and 8 (col-sm-8) columns respectively.
visibility: the submission scope for which the field should be visible. Values allowed are submission or workflow. When one of the two options is
given the filed will be visible only for the scope provided and it will be hidden otherwise.
readonly: this option can be used only together with the visibility element, and it means the field should be a read-only input instead of being
hidden out of the scope provided by the visibility element. The value allowed is readonly, e.g.: <readonly>readonly</readonly>
A field configured to be visible only with submission scope, while is hidden with workflow scope
<field>
<dc-schema>dc</dc-schema>
<dc-element>title</dc-element>
<dc-qualifier>alternative</dc-qualifier>
<repeatable>true</repeatable>
<label>Other Titles</label>
<input-type>onebox</input-type>
<hint>If the item has any alternative titles, please enter them here.</hint>
<required></required>
<visibility>submission</visibility>
</field>
A field configured to be visible only with workflow scope, while is read-only with submission scope
<field>
<dc-schema>dc</dc-schema>
<dc-element>title</dc-element>
<dc-qualifier>alternative</dc-qualifier>
<repeatable>true</repeatable>
<label>Other Titles</label>
<input-type>onebox</input-type>
<hint>If the item has any alternative titles, please enter them here.</hint>
<required></required>
231
<readonly>readonly</readonly>
<visibility>workflow</visibility>
</field>
A field can be made visible depending on the value of dc.type. A new field element, <type-bind>, has been introduced to facilitate this. The <type-bind>
takes a comma separated list of publication types. If the field is missing or empty, it will always be visible. In this example the field will only be visible if a
value of "thesis" or "ebook" has been entered into dc.type on an earlier page:
<field>
<dc-schema>dc</dc-schema>
<dc-element>identifier</dc-element>
<dc-qualifier>isbn</dc-qualifier>
<label>ISBN</label>
<type-bind>thesis,ebook</type-bind>
</field>
A field may be configured multiple times in the submission configuration with different values in type-bind. This is useful if a field is required for one type
but not another, or should display a different label and hint message depending on the publication type:
<field>
<dc-schema>dc</dc-schema>
<dc-element>identifier</dc-element>
<dc-qualifier>isbn</dc-qualifier>
<label>ISBN</label>
<type-bind>book,ebook</type-bind>
<required>You must enter an ISBN for this book</required>
</field>
<field>
<dc-schema>dc</dc-schema>
<dc-element>identifier</dc-element>
<dc-qualifier>isbn</dc-qualifier>
<label>ISBN of Parent Publication</label>
<type-bind>thesis,book chapter,letter</type-bind>
<hint>Enter the ISBN of the book in which this was published</hint>
</field>
If a field is required but is bound to a type that does not match the submitted publication, the required value will be ignored.
Note: When the submitter changes the Type field, other fields (usually just below it) dynamically appear. There's a brief demo of this feature in the 2022-07-
13 - DSpace 7 Q&A webinar at time 19:05. The submission process is one page, but it has collapsible sections, each of which corresponds to one of the
old "pages".
The taxonomies are described in XML following this (very simple) structure:
232
You are free to use any application you want to create your controlled vocabularies. A simple text editor should be enough for small projects. Bigger
projects will require more complex tools. You may use Protegé to create your taxonomies, save them as OWL and then use a XML Stylesheet (XSLT) to
transform your documents to the appropriate format. Future enhancements to this add-on should make it compatible with standard schemas such as OWL
or RDF.
New vocabularies should be placed in [dspace]/config/controlled-vocabularies/ and must be according to the structure described.
Vocabularies need to be associated with the corresponding metadata fields. Edit the file [dspace]/config/submission-forms.xml and place a "voc
abulary" tag under the "field" element that you want to control. Set value of the "vocabulary" element to the name of the file that contains the
vocabulary, leaving out the extension (the add-on will only load files with extension "*.xml"). For example:
<field>
<dc-schema>dc</dc-schema>
<dc-element>subject</dc-element>
<dc-qualifier></dc-qualifier>
<repeatable>true</repeatable>
<label>Subject Keywords</label>
<input-type>onebox</input-type>
<hint>Enter appropriate subject keywords or phrases below.</hint>
<required></required>
<vocabulary>srsc</vocabulary>
</field>
The vocabulary element has an optional boolean attribute closed that can be used to force input only with the Javascript of controlled-vocabulary add-on.
The default behaviour (i.e. without this attribute) is as set closed="false". This allow the user also to enter the value in free way.
Controlled vocabularies have two main display types in the submission form:
1. <input-type>onebox</input-type> will display a onebox style field (optionally repeatable) which pops up the entire hierarchical vocabulary
to allow you to select an individual term.
2. <input-type>tag</input-type> will display a tag-style field (optionally repeatable) which suggests terms within the vocabulary as you type.
Adding Value-Pairs
233
Finally, your custom form description needs to define the "value pairs" for any fields with input types that refer to them. Do this by adding a value-pairs
element to the contents of form-value-pairs. It has the following required attributes:
Each value-pairs element contains a sequence of pair sub-elements, each of which in turn contains two elements:
displayed-value – Name shown (on the web page) for the menu entry.
stored-value – Value stored in the DC element when this entry is chosen. Unlike the HTML select tag, there is no way to indicate one of the
entries should be the default, so the first entry is always the default choice.
Example
Here is a menu of types of common identifiers:
It generates the following HTML, which results in the menu widget below. (Note that there is no way to indicate a default choice in the custom input XML,
so it cannot generate the HTML SELECTED attribute to mark one of the options as a pre-selected default.)
<select name="identifier_qualifier_0">
<option VALUE="govdoc">Gov't Doc #</option>
<option VALUE="uri">URI</option>
<option VALUE="isbn">ISBN</option>
</select>
You must always restart Tomcat (or whatever servlet container you are using) for changes made to the submission-forms.xml and/or item-
submission.xml to take effect.
Any mistake in the syntax or semantics of the form definitions, such as poorly formed XML or a reference to a nonexistent field name, may result in errors
in the DSpace REST API & UI. The exception message (at the top of the stack trace in the dspace.log file) usually has a concise and helpful explanation of
what went wrong. Don't forget to stop and restart the servlet container before testing your fix to a bug.
Basic Settings
The Upload step in the DSpace submission process has a few configuration options which can be set with your [dspace]/config/local.cfg configuration file.
They are as follows:
spring.servlet.multipart.max-file-size (default=512MB) - Spring Boot's maximum allowable file upload size. For DSpace, we default it to 512MB (in
application.properties). But, you may wish to override the default value in your local.cfg. Example values include "512MB", "1GB", or even "-1" (to
allow unlimited). See Spring's documentation on this setting: https://spring.io/guides/gs/uploading-files/#_tuning_file_upload_limits
NOTE: Increasing this value significantly does NOT guarantee that DSpace will be able to successfully upload files of a very large size
via the web. Large uploads depend on many other factors including bandwidth, web server settings, internet connection speed, etc.
Therefore, for very large files, you may need to consider importing via command-line tools or similar.
spring.servlet.multipart.max-request-size (default=512MB) - Spring Boot's maximum allowable upload request size (i.e. the maximum total upload
size for all files in a multi-file upload). For DSpace, we default it to 512MB (in application.properties). But, you may wish to override the default
value in your local.cfg. Example values include "512MB", "1GB", or even "-1" (to allow unlimited). See Spring's documentation on this setting: https
://spring.io/guides/gs/uploading-files/#_tuning_file_upload_limits
234
NOTE: Increasing this value significantly does NOT guarantee that DSpace will be able to successfully upload files of a very large size
via the web. Large uploads depend on many other factors including bandwidth, web server settings, internet connection speed, etc.
Therefore, for very large files, you may need to consider importing via command-line tools or similar.
webui.submit.upload.required - Whether or not all users are required to upload a file when they submit an item to DSpace. It defaults to 'true'.
When set to 'false' users will see an option to skip the upload step when they submit a new item.
<form-definitions>
<!-- Form used for entering in Bitstream/File metadata after uploading a file -->
<form name="bitstream-metadata">
...
</form>
</form-definitions>
These access conditions are defined in a new Spring Bean configuration file [dspace]/config/spring/api/access-conditions.xml
The "uploadConfigurationService" bean maps an existing "UploadConfiguration" bean (default is "uploadConfigurationDefault") to a specific step
/section name used in item-submission.xml.
<!-- This default configuration says the <step-definition id="upload"> defined in item-submission.xml
uses "uploadConfigurationDefault" -->
<bean id="uploadConfigurationService" class="org.dspace.submit.model.UploadConfigurationService">
<property name="map">
<map>
<entry key="upload" value-ref="uploadConfigurationDefault" />
</map>
</property>
</bean>
One or more UploadConfiguration beans may exist, providing different options for different upload sections. An "UploadConfiguration" consists of
several properties:
235
<ref bean="administrator"/>
</list>
</property>
</bean>
Any number of "AccessConditionOption" beans may be added for applying different types of access permissions to uploaded files (based on
which one the user selects). These beans are easy to add/update, and just require the following
id (Required): Each defined bean MUST have a unique "id" and have "class=org.dspace.submit.model.AccessConditionOption".
groupName: Optionally, define a specific DSpace Group which this Access Condition relates to. This group will be saved to the
ResourcePolicy when this access condition is applied.
name: Give a unique name for this Access Condition. This name is stored in the ResourcePolicy "name" when this access condition is
applied.
hasStartDate: If the access condition is time-based, you can decide whether a start date is required. (true = required start date, false =
disabled/not required). This start date will be saved to the ResourcePolicy when this access condition is applied.
startDateLimit: If the access condition is time-based, you can optionally set an start date limit (e.g. +36MONTHS). This field is used to
set an upper limit to the start date based on the current date. In other words, a value of "+36MONTHS" means that users cannot set a
start date which is more than 3 years from today. This setting's value uses Solr's Date Math Syntax, and is always based on today
(NOW).
hasEndDate: If the access condition is time-based, you can enable/disable whether an end date is required. (true = required end date,
false = disabled/not required). This end date will be saved to the ResourcePolicy when this access condition is applied.
endDateLimit: If the access condition is time-based, you can optionally set an end date limit (e.g. +6MONTHS). This field is used to set
an upper limit to the start date based on the current date. In other words, a value of "+6MONTHS" means that users cannot set an end
date which is more than 6 months from today. This setting's value use Solr's Date Math Syntax, and is always based on today (NOW).
<!-- Example access option named "embargo", which lets users specify a future date
(not more than 3 years from now) when this file will be available to Anonymous users -->
<bean id="embargoed" class="org.dspace.submit.model.AccessConditionOption">
<property name="groupName" value="Anonymous"/>
<property name="name" value="embargo"/>
<property name="hasStartDate" value="true"/>
<property name="startDateLimit" value="+36MONTHS"/>
<property name="hasEndDate" value="false"/>
</bean>
NOTE: It's possible to test the Date math syntax via command-line to see what the value would look like starting from today:
By default, DSpace comes with these out-of-the-box Access Conditions (which you can customize/change based on local requirements)
"administrator" - access restricts the bitstream to the Administrator group immediately (after submission completes)
"openAccess" - makes the bitstream immediately accessible to Anonymous group (after submission completes)
"embargoed" - embargoes the bitstream for a period of time (maximum of 3 years, as defined in startDateLimit default setting), after
which it becomes anonymously accessible. See also Embargo for discussion of how embargoes work in DSpace.
"lease" - makes the bitstream anonymously accessible immediately (after submission completes), but that access expires after a period
of time (maximum of 6 months, as defined in endDateLimit default setting). After that date it is no longer accessible (except to
Administrators)
1. Create a new <bean> below “uploadConfigurationDefault” with the following settings, adding only the line <property name="required"
value="false">
<!-- Define a new bean that sets "required=false" for the UploadConfiguration -->
<bean id="uploadConfigurationWithoutMandatory"
class="org.dspace.submit.model.UploadConfiguration">
<property name="name" value="upload" />
<property name="metadata" value="bitstream-metadata" />
<property name="required" value="false" />
<property name="options">
<list>
<ref bean="openAccess" />
<ref bean="lease" />
<ref bean="embargoed" />
<ref bean="administrator" />
236
</list>
</property>
</bean>
Now, in the item-submission.xml file, you need to make the following changes:
2. Finally, what you need to do is use “uploadOptional” as a <step>, replacing default upload, for example:
<!-- This first example is a submission form that *REQUIRES* file upload
by using the existing "upload" step. -->
<submission-process name="traditional">
<step id="collection" />
<step id="traditionalpageone" />
<step id="traditionalpagetwo" />
<step id="upload" />
<step id="license" />
</submission-process>
<!-- This second example is a submission form that makes file upload *OPTIONAL*
by using the newly defined "uploadOptional" step. -->
<submission-process name="traditionalUploadOptional">
<step id="collection" />
<step id="traditionalpageone" />
<step id="traditionalpagetwo" />
<step id="uploadOptional" />
<step id="license" />
</submission-process>
At this point, any Collection which uses the "traditionalUploadOptional" submission-process will no longer require files to be uploaded. But, any Collection
which still uses the "traditional" submission-process will still require files to be uploaded.
<submission-process name="traditional">
...
<!-- This step enables embargoes and other access restrictions at the Item level -->
237
<step id="itemAccessConditions"/>
</submission-process>
After making this update, you will need to restart your backend (REST API) for the changes to take effect.
All available Item access conditions are defined in a new Spring Bean configuration file [dspace]/config/spring/api/access-conditions.xml
One or more "AcccessConditionConfiguration" beans may exist, providing different options for different submission forms (only one should be in
use in a form at a time). By default an "accessConditionConfigurationDefault" bean is defined. An "AccessConditionConfiguration" consists of
several properties:
name (Required): The unique name of this configuration. It must match the "id" of the step defined in your item-submission.xml
canChangeDiscoverable: Whether this configuration allows users to change the discoverability of an Item. A "discoverable" item is one
that is findable through all search/browse interfaces, provided that you have access to see that Item. A "non-discoverable" item is one
that will never be findable through search/browse (except by Administrators)... instead a direct link is necessary to view the Item. See
also DSpace Item State Definitions. When "canChangeDiscoverable" is "true", the user can modify discoverability in this submission
section. When set to "false", the user cannot modify this setting and all submitted Items will be "discoverable".
options (Required): list of all "AccessConditionOption" beans to enable for this Item access conditions step. This list will be shown to the
user to let them select which access restrictions to place on this Item.
This step uses the same "AccessConditionOption" beans as the "Upload" step, as described in the Modifying access conditions (embargo, etc.)
presented for Bitstreams documentation above. You can choose to enable the same options for both Items and Bitstreams, or provide different
options for each.
By default, DSpace comes with these out-of-the-box Access Conditions (which you can customize/change based on local requirements)
"administrator" - access restricts the bitstream to the Administrator group immediately (after submission completes)
"openAccess" - makes the bitstream immediately accessible to Anonymous group (after submission completes)
"embargoed" - embargoes the bitstream for a period of time (maximum of 3 years, as defined in startDateLimit default setting), after
which it becomes anonymously accessible. See also Embargo for discussion of how embargoes work in DSpace.
"lease" - makes the bitstream anonymously accessible immediately (after submission completes), but that access expires after a period
of time (maximum of 6 months, as defined in endDateLimit default setting). After that date it is no longer accessible (except to
Administrators)
Generally speaking, both access restrictions will be applied. Here's some examples:
If a user selects "openAccess" in the "Item Access Conditions" step AND "embargo" in the "Upload" step for one Bitstream
Then, the Item's metadata will be publicly visible, but that single Bitstream will be embargoed.
If a user selects "openAccess" in the "Item Access Conditions" step AND "administrator" in the "Upload" step for one Bitstream
Then, the Item's metadata will be publicly visible, but that single Bitstream will only be visible to Administrators
If a user selects "administrator" in the "Item Access Conditions" step AND nothing in the "Upload" step.
Then, the Item's metadata and all Bitstreams will only be accessible to administrators.
If a user selects "embargo" in the "Item Access Conditions" step AND nothing in the "Upload" step.
Then, the Item's metadata and all Bitstreams will be embargoed. Nothing will be visible in the system until the embargo data passes.
If a user selects "embargo" in the "Item Access Conditions" step AND "openAccess" in the "Upload" step for one Bitstream.
Then, the Item's metadata will be embargoed (making it impossible to find the Item unless you are an Administrator). HOWEVER, the
bitstream will be publicly accessible immediately (but only via a direct link, as it won't be searchable in the system until the embargo date
passes).
(To test other scenarios, submit a test Item with those permissions applied. Then, edit that Item, visit the "Status" tab, and click "Authorizations" to
see what access restrictions were applied to the Item and its bitstreams.)
<submission-process name="traditional">
...
<!-- This step shows when appropriate publisher policies retrieved from SHERPA/RoMEO -->
<step id="sherpaPolicies"/>
</submission-process>
you must also obtain your sherpa.romeo.apikey registering you client application here https://v2.sherpa.ac.uk/api/ and put them in the local.cfg
238
sherpa.romeo.apikey = <YOUR-API-KEY>
The step needs to extract the ISSN of the Journal where the publication has been submitted/published to query the Sherpa/RoMEO database in order to
visualize the publisher policies, this is done by an implementation of the org.dspace.app.sherpa.submit.ISSNItemExtractor interface
configured in the org.dspace.app.sherpa.submit.SHERPASubmitConfigurationService
<bean class="org.dspace.app.sherpa.submit.SHERPASubmitConfigurationService"
id="org.dspace.app.sherpa.submit.SHERPASubmitConfigurationService">
<property name="issnItemExtractors">
<list>
<bean class="org.dspace.app.sherpa.submit.MetadataValueISSNExtractor">
<property name="metadataList">
<list>
<value>dc.identifier.issn</value>
</list>
</property>
</bean>
<!-- Uncomment this bean if you have SHERPARoMEOJournalTitle enabled
<bean class="org.dspace.app.sherpa.submit.MetadataAuthorityISSNExtractor">
<property name="metadataList">
<list>
<value>dc.title.alternative</value>
</list>
</property>
</bean> -->
</list>
</property>
</bean>
out-of-box implementations able to extract the ISSN from the metadata value or authority are provided.
239
Configuring the "Identifiers" step
By default, the "Identifiers" step is disabled. To enable it, update your item-submission.xml to include this tag in your <submission-process>:
<submission-process name="traditional">
...
<!-- This step shows identifiers already registered for this in-progress item
<step id="identifiers"/>
...
</submission-process>
It is recommended to display this step above most others so that the submitter can clearly see any identifiers that will be created while completing their
submission.
You must also enable registration of identifiers for workspace and workflow items in dspace/config/modules/identifiers.cfg or local.cfg (thi
s is disabled by default):
identifiers.submission.register = true
While editing this configuration, pay attention to the filter configuration - logical item filters can be referenced here to apply some conditions as to whether
an item qualifies for a DOI or not (eg. based on metadata entered, the type of work, or so on).
Any identifiers registered for the current submission or workflow item will be displayed in a read-only section. If no identifiers are registered, a placeholder
“no identifiers” message will be displayed.
If DOI registration is configured for logical item filtering, the DOI will be minted (in a 'pending' state) or deleted as appropriate whenever the in-progress
item is saved, depending on whether it passes the filter test.
See dspace/config/modules/identifiers.cfg
240
Creating new Submission Steps Programmatically.
First, a brief warning: Creating a new Submission Step requires some Java knowledge, and is therefore recommended to be undertaken by a Java
programmer whenever possible.
In most scenarios, this is NOT necessary, as it's much easier to configure a custom Submission Step using DescribeStep or similar.
That being said, at a higher level, creating a new Submission Step requires the following (in this relative order):
241
Live Import from external sources
1 General Framework
1.1 Introduction
1.2 Features
1.3 Abstraction of input format
1.4 Editing Metadata Mapping
1.5 Transformation to DSpace Item
1.5.1 Implementation of an import source for External Sources
1.5.2 Implementation of an import source for files
1.5.3 Mapping raw data to Metadata
1.5.4 Inherited methods
1.5.5 Spring configuration for External Sources
1.5.6 Metadata mapping
1.5.7 Available Metadata Contributor classes
1.6 Framework Sources Implementations
1.6.1 PubMed Integration
1.6.1.1 Introduction
1.6.1.2 Publication Lookup URL
1.6.1.3 PubMed Metadata Mapping
1.6.1.4 PubMed specific classes Config
1.6.1.4.1 Metadata mapping classes
1.6.1.4.2 Service classes
1.6.2 ArXiv Integration
1.6.2.1 ArXiv Metadata Mapping
General Framework
Introduction
This framework is used by both the REST API and User Interface to help enhance or enrich submissions. One examples usage is in Importing Items via
basic bibliographic formats (Endnote, BibTex, RIS, CSV, etc) and online services (arXiv, PubMed, CrossRef, CiNii, etc)
Features
lookup publications from remote sources
Support for multiple implementations
In that directory, you'll find a mapping file per import source, e.g. "arxiv-integration.xml", "bibtex-integration.xml", "endnote-integration.xml", "pubmed-
integration.xml", etc.
1. First, mapping from a file-based import (e.g. bibtex, endnote, ris, etc) to a DSpace metadata field.
a. The list of all of the enabled mappings can be found in a "MetadataFieldConfig" <util:map>, usually at the top of the config file.
242
<util:map id="bibtexMetadataFieldMap" key-type="org.dspace.importer.external.metadatamapping.
MetadataFieldConfig"
value-type="org.dspace.importer.external.metadatamapping.contributor.
MetadataContributor">
<description>Defines which metadatum is mapped on which metadatum. Note that while the key
must be unique it
only matters here for postprocessing of the value. The mapped MetadatumContributor has
full control over
what metadatafield is generated.
</description>
<!-- These entry tags are the enabled mappings. The "value-ref" must map to a <bean> ID. -->
<entry key-ref="dcTitle" value-ref="bibtexTitleContrib" />
<entry key-ref="dcAuthors" value-ref="bibtexAuthorsContrib" />
<entry key-ref="dcJournal" value-ref="bibtexJournalContrib" />
<entry key-ref="dcIssued" value-ref="bibtexIssuedContrib" />
<entry key-ref="dcJissn" value-ref="bibtexJissnContrib" />
</util:map>
b. Each field in the file is mapped to a DSpace metadata field in a "SimpleMetadataContributor" bean definition. NOTE: a large number of
DSpace defined metadata fields are already configured as MetadataFieldConfig beans in the "dublincore-metadata-mapper.xml" Spring
Config in the same directory. These may be reused in other configurations.
<!-- This example bean for BibTex says the "title" key in the BibTex" file should be mapped to the
DSpace metadata field
defined in the "dcTitle" bean. This "dcTitle" bean is found in "dublincore-metadata-mapper.
xml" and obviously maps to "dc.title" -->
<bean id="bibtexTitleContrib" class="org.dspace.importer.external.metadatamapping.contributor.
SimpleMetadataContributor">
<property name="field" ref="dcTitle"/>
<property name="key" value="title" />
</bean>
2. Second, mapping from an external API query import (e.g. arxiv, pubmed, etc) to a DSpace metadata field.
a. Similar to above, The list of all of the enabled mappings can be found in a "MetadataFieldConfig" <util:map>, usually at the top of the
config file.
b. Each field in the file is mapped to a DSpace metadata field, usually in a "SimpleXPathMetadatumContributor" bean definition which also
uses a "MetadataFieldConfig" bean. NOTE: a large number of DSpace defined metadata fields are already configured as
MetadataFieldConfig beans in the "dublincore-metadata-mapper.xml" Spring Config in the same directory. These may be reused in other
configurations.
243
<!-- This first bean define an XPath query ("ns:title") to map to a field (ID="arxiv.title") in
DSpace -->
<bean id="arxivTitleContrib" class="org.dspace.importer.external.metadatamapping.contributor.
SimpleXpathMetadatumContributor">
<property name="field" ref="arxiv.title"/>
<property name="query" value="ns:title"/>
<property name="prefixToNamespaceMapping" ref="arxivBasePrefixToNamespaceMapping"/>
</bean>
<!-- This second bean then defines which DSpace field to use when "arxiv.title" is references. In
other words, between these two beans,
the "ns:title" XPath query value is saved to "dc.title". -->
<bean id="arxiv.title" class="org.dspace.importer.external.metadatamapping.MetadataFieldConfig">
<constructor-arg value="dc.title"/>
</bean>
This method is responsible to transform the input data into an ImportRecord list, which will then managed by the top layer of the framework.
The conversion from raw data to an ImportRecord could be done using the framework too, using the metadata mapping structure (see below).
File sources needs to know which file extensions they have to supports. This is done by the default method isValidSourceForFile in FileSource,
and is controlled by the entries in the list returned by declared method public List<String> getSupportedExtensions();
The framework core is a mid-layer component which allow the conversion of raw data into metadata (ImportRecord) using xml configurable spring beans.
Our service then should extends AbstractImportMetadataSourceService, and use transformSourceRecords to transform raw data into ImportRecords.
RecordType is a generic type, which rapresent a single entry of the list of data, and will be mapped to a single ImportRecord. Any metadatum
will be mapped to a specific field in the RecordType using a Contributor as described in Metadata mapping.
Inherited methods
244
Method getImportSource() should return a unique identifier. Importer implementations should not be called directly, but class org.dspace.importer.external.
service.ImportService should be called instead. This class contains the same methods as the importer implementations, but with an extra parameter 'url'.
This url parameter should contain the same identifier that is returned by the getImportSource() method of the importer implementation you want to use.
The other inherited methods are used to query the remote source.
This is an example of a provider which allow to import both files and remote source.
<bean id="PubmedImportService"
class="org.dspace.importer.external.pubmed.service.PubmedImportMetadataSourceServiceImpl" scope="
singleton">
<property name="metadataFieldMapping" ref="PubmedMetadataFieldMapping"/>
<property name="supportedExtensions">
<list>
<value>xml</value>
</list>
</property>
...
</bean>
Here is defined the service responsible to fetch and transform the data PubmedImportMetadataSourceServiceImpl, which is an extension
of AbstractImportMetadataSourceService as described above.
To expose this provider as Live Import provider, we need to construct a bean of type org.dspace.external.provider.impl.
LiveImportDataProvider in the following way
where metadataSource is the bean referencing to live import service as described in “Metadata mapping”, sourceIdentifier the name of
the provider in the live import framework and recordIdMetadata the metadatum used as id of the ImportRecord.
Metadata mapping
When using an implementation of AbstractImportSourceService, a mapping of remote record fields to DSpace metadata fields can be created.
first create an implementation of class AbstractMetadataFieldMapping with the same type set used for the importer implementation.
Each DSpace metadata field that will be used for the mapping must first be configured as a spring bean of classorg.dspace.importer.external.
metadatamapping.MetadataFieldConfig.
NOTE: A large number of these MetadataFieldConfig definitions are already provided out-of-the-box in [dspace.dir]/config/spring/api
/dublincore-metadata-mapper.xml This allows most service-specific Spring configurations to just reuse those existing MetadataFieldConfig
definitions
Now this metadata field can be used to create a mapping. To add a mapping for the "dc.title" field declared above, a new spring bean configuration of a
class class org.dspace.importer.external.metadatamapping.contributor.MetadataContributor needs to be added. This interface contains a type argument.
The type needs to match the type used in the implementation of AbstractImportSourceService. The responsibility of each MetadataContributor
implementation is to generate a set of metadata from the retrieved document. How it does that is completely opaque to the AbstractImportSourceService
but it is assumed that only one entity (i.e. item) is fed to the metadatum contributor.
245
For example java SimpleXpathMetadatumContributor implements MetadataContributor<OMElement> can parse a fragment of xml and
generate one or more metadata values.
field: A reference to the configured spring bean of the DSpace metadata field. e.g. the "dc.title" bean declared above.
query: The xpath expression used to select the record value returned by the remote source.
Multiple record fields can also be combined into one value. To implement a combined mapping first create a SimpleXpathMetadatumContributor as
explained above for each part of the field.
Note that namespace prefixes used in the xpath queries are configured in bean "FullprefixMapping" in the same spring file.
Then create a new list in the spring configuration containing references to all SimpleXpathMetadatumContributor beans that need to be combined.
<ref bean="lastNameContrib"/>
<ref bean="firstNameContrib"/>
</util:list>
field: A reference to the configured spring bean of the DSpace metadata field. e.g. the "dc.title" bean declared above.
metadatumContributors: A reference to the list containing all the single record field mappings that need to be combined.
separator: These characters will be added between each record field value when they are combined into one field.
Each contributor must also be added to the "MetadataFieldMap" used by the MetadataFieldMapping implementation. Each entry of this map maps a
metadata field bean to a contributor. For the contributors created above this results in the following configuration:
<util:map id="org.dspace.importer.external.metadatamapping.MetadataFieldConfig"
value-type="org.dspace.importer.external.metadatamapping.contributor.MetadataContributor">
<entry key-ref="dc.title" value-ref="titleContrib"/>
<entry key-ref="dc.contributor.author" value-ref="authorContrib"/>
</util:map>
Note that the single field mappings used for the combined author mapping are not added to this list.
Class Description
246
SimpleXpathMetadatumContrib Use an XPath expression to map the XPath result to a metadatum
utor
SimpleMetadataContributor This contributor is used in plain metadata as exposed above. Mapping is easy because it is based on the key used in
the DTO.
CombinedMetadatumContributor Use a LinkedList of MetadataContributor to combine into the value the resulting value for each contributor.
PubMed Integration
Introduction
First read the base documentation on external importing (see above). This documentation explains the implementation of the importer framework using
PubMed (http://www.ncbi.nlm.nih.gov/pubmed) as an example.
You can choose to specific a single, specific URL. This will tell the lookup service to only use one location to lookup publication information. Valid
URLs are any that are defined as a baseAddress for beans within the [src]/dspace-api/src/main/resources/spring/spring-
dspace-addon-import-services.xml Spring config file.
For example, this setting will ONLY use PubMed for lookups: publication-lookup.url=http://eutils.ncbi.nlm.nih.gov
/entrez/eutils/
By default, publication-lookup.url is set to an asterisk ('*'). This default value will attempt to lookup the publication using ALL configured
importServices in the [src]/dspace-api/src/main/resources/spring/spring-dspace-addon-import-services.xml Spring
config file
Service classes
"GeneratePubmedQueryService". Generates the pubmed query which is used to retrieve the records. This is based on a given item.
"PubmedImportMetadataSourceServiceImpl". Child class of "AbstractImportMetadataSourceService", retrieving the records from pubmed.
ArXiv Integration
247
Simple HTML Fragment Markup
A few features of the user interface, such as the deposit license text & some metadata fields, can be marked up using a subset of HTML. This HTML
subset is defined by Angular, as we use Angular's "[innerHtml]" property to display these HTML-based fields.
Angular automatically sanitizes any HTML passed to "[innerHtml]" in order to avoid XSS attacks. See Angular docs at https://angular.io/guide
/security#preventing-cross-site-scripting-xss
At this time, Angular does NOT have a formal reference of elements/attributes which are allowed, but we've compiled a list below of currently known
acceptable elements. This list may change in later releases of Angular, but is currently maintained in Angular's "html_sanitizer.ts": https://github.com
/angular/angular/blob/main/packages/core/src/sanitization/html_sanitizer.ts
Not all DSpace fields support HTML, but the User Interface should make it clear which fields do. When adding HTML to a field, you should not create a
complete HTML document (surrounded with "<html>" tags). Just add an HTML fragment.
248
Supervision Orders
Available in 7.5 or later
In order to facilitate, as a primary objective, the opportunity for thesis authors to be supervised in the preparation of their e-theses, a supervision order
system exists to bind groups of other users (thesis supervisors) to an item in someone's pre-submission workspace. The bound group can have system
policies associated with it that allow different levels of interaction with the student's item; a small set of default policy groups are provided:
Once the default set has been applied, a system administrator may modify them as they would any other policy set in DSpace
This functionality could also be used in situations where researchers wish to collaborate on a particular submission, although there is no particular
collaborative workspace functionality.
For Items that are in the "Workspace", it is possible to create a supervision order by clicking on the "Supervision" button.
After clicking "Supervision", you'll be able to create a Supervision order by selecting the "Type of Order" (EDITOR or OBSERVER) and assigning those
permissions to an existing DSpace Group.
249
In DSpace, there are currently two Types of Orders:
EDITOR - The supervising group is given ADD, WRITE, and READ access to the item (but not any bundles or bitstreams that already exist). Any
new bundles or bitstreams inherit the supervising group's policy to permit ADD, WRITE and READ operations.
NOTE: Keep in mind, this does NOT give the supervising group REMOVE policies on the Bundle or DELETE on the Item. This means
the supervising group is only able to edit the metadata & add additional bitstreams. They are NOT able to remove existing bitstreams or
delete the Item unless additional policies are manually added.
OBSERVER - The supervising group is given READ access to the item (but not to any bundles or bitstreams that already exist). Any new bundles
or bitstreams inherit the supervision group's policy to permit READ operations.
NOTE: At this time, there is a known issue where the OBSERVER group will still see the "Edit" and "Delete" buttons, but are unable to
perform those actions. https://github.com/DSpace/dspace-angular/issues/2094
Keep in mind, you can adjust the permissions defined to any order after creating the order! Simply click on the "Policies" button on the "Administer
Workflow" page to adjust the default policies for that supervising group!
Supervising a Submission
Once a Supervision Order is created (see above step), all group members for the supervising group will see that Item in their "Supervised Items" list on
their MyDSpace page:
250
Based on the type of Submission Order (or additional permissions provided), all members of the supervising group will be able to view and/or edit that in-
progress submission.
On that page, a "Supervised By" filter exists, allowing you to locate all currently supervised items by the assigned group:
You can click on the "Supervised by" label under the supervised item to remove the existing supervision order. New orders can be added by clicking the
"Supervision" button. You can also adjust any supervising group permissions by editing the policies directly by clicking on the "Policies" button.
251
Items and Metadata
Authority Control of Metadata Values
Batch Metadata Editing
DOI Digital Object Identifier
Item Level Versioning
Mapping/Linking Items to multiple Collections
Metadata Recommendations
Moving Items
PDF Citation Cover Page
Updating Items via Simple Archive Format
252
Authority Control of Metadata Values
1 Introduction
2 Simple choice management for DSpace submission forms
2.1.1 Example
2.2 Use simple choice management to add language tags to metadata fields
3 Hierarchical Taxonomies and Controlled Vocabularies
3.1 Default Hierarchical Controlled Vocabularies
3.2 Enabling / Disabling a Hierarchical Controlled Vocabulary
3.3 How to invoke a controlled vocabulary from submission-forms.xml
4 Authority Control: Enhancing DSpace metadata fields with Authority Keys
4.1 How it works
4.2 Original source:
Introduction
With DSpace you can describe digital objects such as text files, audio, video or data to facilitate easy retrieval and high quality search results. These
descriptions are organized into metadata fields that each have a specific designation. For example: dc.title stores the title of an object, while dc.subject is
reserved for subject keywords.
For many of these fields, including title and abstract, free text entry is the proper choice, as the values are likely to be unique. Other fields are likely to have
values drawn from controlled sets. Such fields include unique names, subject keywords, document types and other classifications. For those kinds of fields
the overall quality of the repository metadata increases if values with the same meaning are normalized across all items. Additional benefits can be gained
if unique identifiers are associated as well in addition to canonical text values associated with a particular metadata field.
This page covers features included in the DSpace submission forms that allow repository managers to enforce the usage of normalized terms for those
fields where this is required in their institutional use cases. DSpace offers simple and straightforward features, such as definitions of simple text values for
dropdowns, as well as more elaborate integrations with external vocabularies such as the Library of Congress Naming Authority.
Example
It generates the following HTML, which results in the menu widget below.
<select name="identifier_qualifier_0">
<option VALUE="govdoc">Gov't Doc #</option>
<option VALUE="uri">URI</option>
<option VALUE="isbn">ISBN</option>
</select>
Each value-pairs element contains a sequence of pair sub-elements, each of which in turn contains two elements:
displayed-value – Name shown (on the web page) for the menu entry.
253
stored-value – Value stored in the DC element when this entry is chosen. Unlike the HTML select tag, there is no way to indicate one of the
entries should be the default, so the first entry is always the default choice.
As you can see, each node element has an id and label attribute. It can contain the isComposedBy element, which in its turn, consists of a list of other
nodes.
You are free to use any application you want to create your controlled vocabularies. A simple text editor should be enough for small projects. Bigger
projects will require more complex tools. You may use Protegé to create your taxonomies, save them as OWL and then use a XML Stylesheet (XSLT) to
transform your documents to the appropriate format. Future enhancements to this add-on should make it compatible with standard schemas such as OWL
or RDF.
nsi - nsi.xml - The Norwegian Science Index (in the Norweigen language)
srsc - srsc.xml - Swedish Research Subject Categories (in the English language, with notes in Swedish)
You may create your own hierarchical controlled vocabulary by using either of those as a model. All valid hierarchical vocabularies should align with the
"controlledvocabulary.xsd" schedule available in that same directory.
To disable a hierarchical controlled vocabulary, simply remove it from all your fields in your "submission-forms.xml". You can also disable all controlled
vocabularies by commenting out the "DSpaceControlledVocabulary" plugin in "authority.cfg":
authority.cfg
plugin.selfnamed.org.dspace.content.authority.ChoiceAuthority = \
org.dspace.content.authority.DCInputAuthority, \
org.dspace.content.authority.DSpaceControlledVocabulary
The vocabulary element has an optional boolean attribute closed that can be used to force input only with the Javascript of controlled-vocabulary add-on.
The default behaviour (i.e. without this attribute) is closed="false". This allows the user to enter values as free text in addition to selecting them from
the controlled vocabulary.
Authority An authority is an external source of fixed values for a given domain, each unique value identified by a key. For example, the OCLC LC
Name Authority Service, ORCID or VIAF.
Authority Record The information associated with one of the values in an authority; may include alternate spellings and equivalent forms of the
value, etc.
Authority Key An opaque, hopefully persistent, identifier corresponding to exactly one record in the authority.
The fact that this functionality deals with external sources of authority makes it inherently different from the functionality for controlled vocabularies.
Another difference is that the authority control is asserted everywhere metadata values are changed, including unattended/batch submission, SWORD
package submission, and the administrative UI.
How it works
TODO
Original source:
Authority Control of Metadata Values original development proposal for DSpace 1.6
255
ORCID Authority
1 Introduction
2 Use case and high level benefits
3 Enabling the ORCID authority control
3.1 Settings to enable in local.cfg
3.2 Enabling the ORCID beans
4 Importing existing authors & keeping the index up to date
4.1 Different possible use cases for Index-authority script
4.1.1 Metadata value WITHOUT authority key in metadata
4.1.2 Metadata that already has an authority key from an external source (NOT auto-generated by DSpace)
4.1.3 Metadata that has already a new dspace generated uid authority key
4.1.4 Processing on records in the authority cache
4.2 Submission of new DSpace items - Author lookup
4.3 Admin Edit Item
4.4 Editing existing items using Batch CSV Editing
4.5 Storage of related metadata
5 Configuration
6 Adding additional fields under ORCID
7 Integration with other systems beside ORCID
8 FAQ
8.1 Which information from ORCID is currently indexed in the authority cache?
8.2 How can I index additional fields in the authority cache?
8.3 How can I use the information stored in the authority cache?
8.4 How to add additional metadata fields in the authority cache that are not related to ORCID?
8.5 What happens to data if another authority control was already present?
8.6 Where can I find the URL that is used to lookup ORCIDs?
ORCID Authority can only pull data from ORCID and link it to a DSpace metadata field
ORCID Authority allows you to link up DSpace metadata fields (added during the submission process) to a person's ORCID identifier. The main use case
for this feature is to allow you to link author metadata fields to their ORCID identifier. This is a very basic ORCID integration that has existed since DSpace
5.x.
If you re looking for ORCID Authentication & the ability to synchronize data from DSpace to an ORCID profile, then you should be using the ORCID
Integration feature instead.
Introduction
The ORCID integration adds ORCID compatibility to the existing solutions for Authority control in DSpace. String names of authors are still being stored in
DSpace metadata. The authority key field is leveraged to store a uniquely generated internal ID that links the author to more extended metadata, including
the ORCID ID and alternative author names.
This extended metadata is stored and managed in a dedicated SOLR index, the DSpace authority cache.
Lowering the threshold to adopt ORCID for the members of the DSpace community
ORCID’s API has enabled developers across the globe to build points of integration between ORCID and third party applications. Up until today, this
meant that members of the DSpace community were still required to implement front-end and back-end modifications to the DSpace source code in order
to leverage these APIs. As DSpace aims to provide turnkey Institutional Repository functionality, the platform is expected to provide more functionality out
of the box. Only an elite selection of members in the DSpace community has software development resources readily available to implement this kind of
functionality. By contributing a solution directly to the core DSpace codebase, this threshold to adopt ORCID functionality in DSpace repositories is
effectively lowered. The ultimate goal is to allow easy adoption of ORCID without customization of the DSpace software, by allowing repository
administrators to enable or disable functionality by means of user friendly configuration.
This proposal aims to provide user friendly features for both repository administrators as well as non- technical end users of the system. The addition of
ORCID functionality to DSpace should not come at the cost of making the system more difficult for administrators and end users to use. Scope With this
vision in mind, the project partners wanted to tackle the first phases for repository managers of existing DSpace repositories: ensuring that ORCIDs are
properly associated with new works entering the system, as well as providing functionality to efficiently batch-update content already existing in the system,
with unambiguous author identity information.
authority.cfg
plugin.named.org.dspace.content.authority.ChoiceAuthority = \
org.dspace.content.authority.SolrAuthority = SolrAuthorAuthority
The feature relies on the following configuration parameters in authority.cfg, solrauthority.cfg and orcid.cfg. To activate the default
settings it suffices to remove the comment hashes ("#") for the following lines or copy them into your local.cfg. See the section at the bottom of this page
what these parameters mean exactly and how you can tweak the configuration.
# These settings can be found in your authority.cfg (or could be added to local.cfg)
choices.plugin.dc.contributor.author = SolrAuthorAuthority
choices.presentation.dc.contributor.author = authorLookup
authority.controlled.dc.contributor.author = true
authority.author.indexer.field.1=dc.contributor.author
Beginning with DSpace 7, you must specify which ORCID API you wish to use. A Client ID/Secret is also required, but can be obtained for free for the
Public API: https://info.orcid.org/documentation/features/public-api/
If you are an ORCID Member Institution, you can use the Member API instead. The Member API is required for additional ORCID Integration features, but
is NOT required for this basic ORCID Authority feature.
# You do NOT need to pay for a Member API ID to use ORCID Authority.
# Instead, you just need a Public API ID from a free ORCID account.
orcid.application-client-id = MYID
orcid.application-client-secret = MYSECRET
The final part of configuration is to add the authority consumer in front of the list of event consumers (in dspace.cfg or local.cfg). Add "authority" in front of
the list as displayed below.
Simply uncomment these settings as-is & restart Tomcat. They will pull their configs from orcid.cfg or your local.cfg.
257
orcid-authority-services.xml
<!-- This bean & alias are commented out by default. Simply uncomment them -->
<alias name="OrcidSource" alias="AuthoritySource"/>
<bean name="OrcidSource" class="org.dspace.authority.orcid.Orcidv3SolrAuthorityImpl">
<property name="clientId" value="${orcid.application-client-id}" />
<property name="clientSecret" value="${orcid.application-client-secret}" />
<property name="OAUTHUrl" value="${orcid.token-url}" />
<property name="orcidRestConnector" ref="orcidRestConnector"/>
</bean>
[dspace]/bin/dspace index-authority
This will iterate over every metadata under authority control and create records of them in the authority index. The metadata without an authority key will
each be updated with an auto generated authority key. These will not be matched in any way with other existing records. The metadata with an authority
key that does not already exist in the index will be indexed with those authority keys. The metadata with an authority key that already exist in the index will
be re-indexed the same way. These records remain unchanged.
All occurences of “Luyten, Bram” in the DSpace item metadata will become linked with the same generated uid.
Metadata that already has an authority key from an external source (NOT auto-generated by DSpace)
“Snyers, Antoine” is present with authority key “u12345”
The old authority key needs to be preserved in the item metadata and duplicated in the cache.
“u12345” will be copied to the authority cache and used as the authority key there.
Metadata that has already a new dspace generated uid authority key
Item metadata already contains an author with name “Haak, Danielle” and a uid in the authority field 3dda2571-6be8-4102-a47b-5748531ae286
This uid is preserved and no new record is being created in the authority index.
258
When ORCID Authority is enabled, the Author field can be used to search entries in ORCID. Simply type in an Author name to search your locally indexed
authors and authors in ORCID.
Select an author entry from the list to add that Author. The List of authors is updated as you type.
Authors that already appear somewhere in the repository are differentiated from the authors that have been retrieved from ORCID.
In the edit metadata page, under the values for the dc.contributor.author fields, an extra line shows the author ID together with a lock icon and a Lookup
button. The author ID cannot be changed manually. However the Lookup button will help you change the author name and ID at the same time.
Clicking the Lookup button brings back the Lookup User Interface. This works just the same way as in the submission forms.
259
Editing existing items using Batch CSV Editing
Instructions on how to use the Batch CSV Editing are found on the Batch Metadata Editing documentation page.
260
ORCID Integration is provided through the Batch CSV Editing feature with an extra available headers "ORCID:dc.contributor.author". The usual CSV
headers only contain the metadata fields: e.g. "dc.contributor.author". In addition to the traditional header, another dc.contributor.author header can be
added with the "ORCID:" prefix. The values in this column are supposed to be ORCIDs.
For each of the ORCID authors a lookup will be done and their names will be added to the metadata. All the non-ORCID authors will be added as well. The
authority keys and solr records are added when the reported changes are applied.
261
Storage of related metadata
ORCID authorities not only link a digital identifier to a name. It regroups a load of metadata going from alternative names and email addresses to keywords
about their works and much more. The metadata is obtained by querying the ORCID web services. In order to avoid querying the ORCID web services
every time, all these related metadata is gathered in a "metadata authority cache" that DSpace can access directly.
In practice the cache is provided by an apache solr server. When a look-up is made and an author is chosen that is not yet in the cache, a record is
created from an ORCID profile and added to the cache with the list of related metadata. The value of the Dublin Core metadata is based on the first and
last name as they are set in the ORCID profile. The authority key for this value links directly to the solr document's id. DSpace does not provide a way to
edit these records manually.
The information in the authority cache can be updated by running the following command line operation:
Arguments description
-i update specific solr records with the given internal ids (comma-separated)
262
This will iterate over every solr record currently in use (unless the -i argument is provided), query the ORCID web service for the latest data and update the
information in the cache. If configured, the script will also update the metadata of the items in the repository where applicable.
The configuration property can be set in config/modules/solrauthority.cfg, or overridden in your local.cfg (see Configuration Reference).
When set to true and this is script is run, if an authority record's information is updated the whole repository will be scanned for this authority. Every
metadata field with this authority key will be updated with the value of the updated authority record.
Configuration
In the Enabling the ORCID authority control section, you have been told to add this block of configuration.
For all of the configuration options described below, you can use either dspace.cfg or local.cfg. Either will work. It is possible that, when you compile your
code with Maven, and you have tests enabled, your build will fail. DSpace unit tests utilize parts of dspace.cfg, and the configuration options you will utilize
below are known to cause unit test errors. The easiest way to avoid this situation is to use the local.cfg file.
solr.authority.server=${solr.server}/authority
choices.plugin.dc.contributor.author = SolrAuthorAuthority
choices.presentation.dc.contributor.author = authorLookup
authority.controlled.dc.contributor.author = true
authority.author.indexer.field.1=dc.contributor.author
# You do NOT need to pay for a Member API ID to use ORCID Authority.
# Instead, you just need a Public API ID from a free ORCID account.
# https://info.orcid.org/documentation/features/public-api/
orcid.application-client-id = MYID
orcid.application-client-secret = MYSECRET
The ORCID Integration feature is an extension on the authority control in DSpace. Most of these properties are extensively explained on the Authority
Control of Metadata Values documentation page. These will be revisited but first we cover the properties that have been newly added.
The solr.authority.server is the url to the solr core. Usually this would be on the solr.server next to the oai, search and statistics cores.
authority.author.indexer.field.1 and the subsequent increments configure which fields will be indexed in the authority cache. However
before adding extra fields into the solr cache, please read the section about Adding additional fields under ORCID.
That's it for the novelties. Moving on to the generic authority control properties:
With the authority.controlled property every metadata field that needs to be authority controlled is configured. This involves every type of
authority control, not only the fields for ORCID integration.
The choices.plugin should be configured for each metadata field under authority control. Setting the value on SolrAuthorAuthority tells
DSpace to use the solr authority cache for this metadatafield, cfr. Storage of related metadata.
The choices.presention should be configured for each metadata field as well. The traditional values for this property are select|suggest|
lookup. A new value, authorLookup, has been added to be used in combination with the SolrAuthorAuthority choices plugin. While the other
values can still be used, the authorLookup provides a richer user interface in the form of a popup on the submission page.
The browse indexes need to point to the new authority-controlled index: webui.browse.index.2 = author:metadata:dc.contributor.
*,dc.creator:text should become webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority
More existing configuration properties are available but their values are independent of this feature and their default values are usually fine: choic
es.closed , authority.required, authority.minconfidence .
For the cache update script, one property can be set in config/modules/solrauthority.cfg:
The final part of configuration is to add the authority consumer in front of the list of event consumers. Add "authority" in front of the list as displayed below.
263
event.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester
Without the consumer there is no automatic indexing of the authority cache and the metadata will not even have authority keys.
Changes to the configuration always require a server restart before they're in effect.
First add the same configuration fields that have been added for the "dc.contributor.author"
choices.plugin.dc.contributor.editor = SolrAuthorAuthority
choices.presentation.dc.contributor.editor = authorLookup
authority.controlled.dc.contributor.editor = true
authority.author.indexer.field.1=dc.contributor.author
authority.author.indexer.field.2=dc.contributor.editor
This is enough to get the look-up interface on the submission page and on the edit metadata page. The authority keys will be added and indexed with the
information from orcid just as it happens with the Authors.
But you're not completely done yet, There is one more configuration step. Because now when adding new editors in the metadata that are not retrieved
through the external look-up, their first and last name will not be displayed in the look-up interface next time you look for them.
To fix this, open the file at config/spring/api/orcid-authority-services.xml and find this spring bean:
The map inside the "fieldDefaults" property needs an additional entry for the editor field:
<entry key="dc_contributor_editor">
<bean class="org.dspace.authority.PersonAuthorityValue"/>
</entry>
With this last change everything is set up to work correctly. The rest of this configuration file is meant for JAVA developers that wish to provide integration
with other systems beside ORCID. Developers that wish to display other fields than first and last name can also have a look in that section.
Note: Each metadata field has a separate set of authority records. Authority keys are not shared between different metadata fields. E. g. multiple dc.
contributor.author can have the same authority key and point to the same authority record in the cache. But when an ORCID is chosen for a dc.contributor.
editor field, a separate record is made in the cache. Both records are updated from the same source and will contain the same information. The difference
is that when performing a look-up of a person that has been introduced as an authority for an author field but not yet as an editor, it will show as record that
is not yet present in the repository cache.
264
FAQ
The system/dspace related fields are: id, field, value, deleted, creation_date, last_modified_date, authority_type.
The fields with data coming directly from ORCID are: first_name, last_name, name_variant, orcid_id, label_researcher_url, label_keyword,
label_external_identifier, label_biography, label_country. The field all_labels contains all the values from the other fields starting with "label_".
The files preceded with a '+' would be necessary to modify to add more info into the cache.
How to add additional metadata fields in the authority cache that are not related to ORCID?
Make the same configuration step as for adding additional fields under ORCID. Currently the ORCID suggestions cannot be turned off for specific fields,
that would require custom code.
In short: authority keys that exist prior to enabling the solrauthority are kept. They just won't show in the look-up until they are indexed.
265
Batch Metadata Editing
1 Batch Metadata Editing Tool
1.1 Export Function
1.1.1 Web Interface Export
1.1.2 Command Line Export
1.2 Import Function
1.2.1 Web Interface Import
1.2.2 Command Line Import
1.3 CSV Format
1.3.1 File Structure
1.4 Editing the CSV
1.4.1 Editing Collection Membership
1.4.2 Adding Metadata-Only Items
1.4.3 Deleting Metadata
1.4.4 Performing 'actions' on items
1.4.5 Migrating Data or Exchanging data
1.4.6 Common Issues
1.4.6.1 Metadata values in CSV export seem to have duplicate columns
1.4.6.2 DSpace responds with "No changes were detected" when CSV is uploaded
1.5 Batch Editing, Entities and Relationships
1.5.1 Background about entities and virtual metadata
1.5.2 Admin CSV export
1.5.3 Admin CSV import
For information about configuration options for the Batch Metadata Editing tool, see Batch Metadata Editing Configuration
Export Function
Exporting search results to CSV was not added until DSpace 7.3
As of DSpace 7.3, it is possible to Export search results to a CSV (similar to 6.x). When logged in as an Administrator, after performing a search a new
"Export search results as CSV" button appears. Clicking it will export the metadata of all items in your search results to a CSV. This CSV can then be
used to perform batch metadata updates (based on the items in your search results). - Release Notes#7.3ReleaseNotes
Please see below documentation for more information on the CSV format and actions that can be performed by editing the CSV.
-i or --id The Item, Collection, or Community handle or Database ID to export. If not specified, all items will be exported.
266
-a or --all Include all the metadata fields that are not normally changed (e.g. provenance) or those fields you configured in the [dspace]
/config/modules/bulkedit.cfg to be ignored on export.
Example:
In the above example we have requested that a collection, assigned handle '1989.1/24' export the entire collection to the file 'col_14.csv' found in the '/batc
h_export' directory.
Please see below documentation for more information on the CSV format and actions that can be performed by editing the CSV .
Import Function
Importing large CSV files
It is not recommended to import CSV files of more than 1,000 lines (i.e. 1,000 items). When importing files larger than this, it may be difficult for an
Administrator to accurately verify the changes that the import tool states it will make. In addition, depending on the memory available to the DSpace site,
large files may cause 'Out Of Memory' errors part way through the import process.
First, complete all editing of the CSV and save your changes
Login as an Administrative User
In sidebar, select "Import" "Metadata" and drag & drop the CSV file
Validate a Batch Metadata CSV was not added until DSpace 7.3
As of DSpace 7.3, it is now possible to validate a Batch Metadata CSV before applying changes (similar to 6.x). When uploading a CSV for batch updates
(using "Import" menu), a new "Validate Only" option is selected by default. When selected, the uploaded CSV will only be validated & you'll receive a report
of the detected changes in the CSV. This allows you to verify the changes are correct before applying them. (NOTE: applying the changes requires re-
submitting the CSV with the "Validate Only" option deselected) - Release Notes#7.3ReleaseNotes
-s or --silent Silent mode. The import function does not prompt you to make sure you wish to make the changes.
-e or --email The email address of the user. This is only required when adding new items.
-w or --workflow When adding new items, the program will queue the items up to use the Collection Workflow processes.
-n or --notify when adding new items using a workflow, send notification emails.
-t or --template When adding new items, use the Collection template, if it exists.
Silent Mode should be used carefully. It is possible (and probable) that you can overlay the wrong data and cause irreparable damage to the database.
267
Example
If you are wishing to upload new metadata without bitstreams, at the command line:
In the above example we threw in all the arguments. This would add the metadata and engage the workflow, notification, and templates to all be applied to
the items that are being added.
CSV Format
The CSV (comma separated values) files that this tool can import and export abide by the RFC4180 CSV format. This means that new lines, and
embedded commas can be included by wrapping elements in double quotes. Double quotes can be included by using two double quotes. The code does
all this for you, and any good csv editor such as Excel or OpenOffice will comply with this convention.
All CSV files are also in UTF-8 encoding in order to support all languages.
File Structure
The first row of the CSV must define the metadata values that the rest of the CSV represents. The first column must always be "id" which refers to the
item's internal database ID. All other columns are optional. The other columns contain the dublin core metadata fields that the data is to reside.
id,collection,dc.title,dc.contributor,dc.date.issued,etc,etc,etc.
Subsequent rows in the csv file relate to items. A typical row might look like:
If you want to store multiple values for a given metadata element, they can be separated with the double-pipe '||' (or another character that you defined in
your modules/bulkedit.cfg file). For example:
Horses||Dogs||Cats
Elements are stored in the database in the order that they appear in the CSV file. You can use this to order elements where order may matter, such as
authors, or controlled vocabulary such as Library of Congress Subject Headings.
268
If you are editing with Microsoft Excel, be sure to open the CSV in Unicode/UTF-8 encoding
By default, Microsoft Excel may not correctly open the CSV in Unicode/UTF-8 encoding. This means that special characters may be improperly displayed
and also can be "corrupted" during re-import of the CSV.
You need to tell Excel this CSV is Unicode, by importing it as follows. (Please note these instructions are valid for MS Office 2007 and 2013. Other Office
versions may vary)
1. The "id" column MUST remain intact. This column also must always have a value in it.
2. To simplify the CSV, you can simply remove any columns you do NOT wish to edit (except for "id" column, see #1). Don't worry, removing the
entire column won't delete metadata (see #3)
3. When importing a CSV file, the importer will overlay the metadata onto what is already in the repository to determine the differences. It only acts
on the contents of the CSV file, rather than on the complete item metadata. This means that the CSV file that is exported can be manipulated
quite substantially before being re-imported. Rows (items) or Columns (metadata elements) can be removed and will be ignored.
a. For example, if you only want to edit "dc.subject", you can remove ALL columns EXCEPT for "id" and "dc.subject" so that you can just
manipulate the "dc.subject" field. On import, DSpace will see that you've only included the "dc.subject" field in your CSV and therefore
will only update the "dc.subject" metadata field for any items listed in that CSV.
4. Because removing an entire column does NOT delete metadata value(s), if you actually wish to delete a metadata value you should leave the
column intact, and simply clear out the appropriate row's value (in that column).
Deleting Metadata
It is possible to perform metadata deletes across the board of certain metadata fields from an exported file. For example, let's say you have used keywords
(dc.subject) that need to be removed en masse. You would leave the column (dc.subject) intact, but remove the data in the corresponding rows.
1. 'expunge' This permanently deletes an item. Use with care! This action must be enabled by setting 'allowexpunge = true' in modules
/bulkedit.cfg
2. 'withdraw' This withdraws an item from the archive, but does not delete it.
3. 'reinstate' This reinstates an item that has previously been withdrawn.
If an action makes no change (for example, asking to withdraw an item that is already withdrawn) then, just like metadata that has not changed, this will be
ignored.
269
Migrating Data or Exchanging data
It is possible that you have data in one Dublin Core (DC) element and you wish to really have it in another. An example would be that your staff have input
Library of Congress Subject Headings in the Subject field (dc.subject) instead of the LCSH field (dc.subject.lcsh). Follow these steps and your data is
migrated upon import:
1. Insert a new column. The first row should be the new metadata element. (We will refer to it as the TARGET)
2. Select the column/rows of the data you wish to change. (We will refer to it as the SOURCE)
3. Cut and paste this data into the new column (TARGET) you created in Step 1.
4. Leave the column (SOURCE) you just cut and pasted from empty. Do not delete it.
Common Issues
It's possible the CSV was not saved properly after editing. Check that the edits are in the CSV, and that there were no backend errors in the
DSpace logs (which would be an indication of an invalid or corrupted CSV file)
Depending on the version of DSpace, you may be encountering this known bug with processing linebreaks in CSV files: https://github.com
/DSpace/DSpace/issues/6600
If you are setting a new embargo date in the CSV, ensure that the embargo lift date is a future date. It's been reported that past dates may cause
DSpace to ignore item changes.
id,collection,dc.title,project.investigator,relation.isPersonOfProject,etc,etc,etc.
270
To link one item to another you need to create a corresponding column of the relation metadata schema, so in our example above relation.
isPersonOfProject. All columns of the form relation.*.latestForDiscovery are created and updated automatically, so you don't want to import them.
If you want to create a new relation, of course you don't know the ID of the relation, you can replace it with a +, then DSpace will assign it on its
own. Of course, people can also be removed from the column or completely new relations can be created for new items, even if there are no old
ones to be taken over.
An example heading row for the CSV import file (project entity):
id,collection,dc.title,relation.isPersonOfProject,etc,etc,etc.
Subsequent example row for the CSV import file (project entity):
350,2292,Project title,"d89c1eb1-2e7c-4912-a1eb-f27b17fd6848::virtual::8585::600||e3595b14-6937-47b9-b718-
1972cb683943::virtual::+::600"
271
Batch Metadata Editing Configuration
The Batch Metadata Editing Tool allows the administrator to extract from the DSpace database a set of records for editing via a CSV file. It provides an
easier way of editing large collections.
Configuration [dspace]/config/modules/bulkedit.cfg
File:
Property: bulkedit.valueseparator
Informational The delimiter used to separate values within a single field. For example, this will place the double pipe between multiple authors
note appearing in one record (Smith, William || Johannsen, Susan). This applies to any metadata field that appears more than once in a
record. The user can change this to another character.
Property: bulkedit.fieldseparator
Informational The delimiter used to separate fields (defaults to a comma for CSV). Again, the user could change it something like '$'. If you wish to
note use a tab, semicolon, or hash (#) sign as the delimiter, set the value to be tab, semicolon or hash.
bulkedit.fieldseparator = tab
Property: bulkedit.authorityseparator
Informational The delimiter used to separate authority data (defaults to a double colon ::)
note
Property: bulkedit.gui-item-limit
Informational When using the WEBUI, this sets the limit of the number of items allowed to be edited in one processing. There is no limit when
note using the CLI.
Property: bulkedit.ignore-on-export
Example Value:
bulkedit.ignore-on-export = dc.date.accessioned, \
dc.date.available, \
dc.date.updated, dc.description.provenance
Informational Metadata elements to exclude when exporting via the user interfaces, or when using the command line version and not using the -a
note (all) option.
Property: bulkedit.allowexpunge
Informational Should the 'action' column allow the 'expunge' method. By default this is set to false
note
Property bulkedit.allow-bulk-deletion
Informational Comma separated list of metadata fields that can be deleted in bulk using the 'metadata-deletion' script. By default only the 'dspace.
note agreements.end-user' field can be deleted in bulk, as doing so allows an Administrator to force all users to re-review the End User
Agreement on their next login. However, you may choose to enable additional fields. Keep in mind, any fields listed here may be
batch deleted from the Processes UI & such metadata deletions cannot be undone.
272
DOI Digital Object Identifier
Persistent Identifier
DOI Registration Agencies
Configure DSpace to use the DataCite API
dspace.cfg
Metadata conversion
Identifier Service
Sending metadata updates to DataCite
DOIs using DataCite and Item Level Versioning
Command Line Interface
'cron' job for asynchronous reservation/registration
Limitations of DataCite DOI support
Configure DSpace to use EZID service for registration of DOIs
Limitations of EZID DOI support
Adding support for other Registration Agencies
Configuring pre-registration of Identifiers
Why mint in submission?
Enable the Identifiers step
Configure filters and behaviour
Administrator registration
Persistent Identifier
It is good practice to use Persistent Identifiers to address items in a digital repository. There are many different systems for Persistent Identifiers: Handle , D
OI , urn:nbn, purl and many more. It is far out of the scope of this document to discuss the differences of all these systems. For several reasons the Handle
System is deeply integrated in DSpace, and DSpace makes intensive use of it. With DSpace 3.0 the Identifier Service was introduced that makes it
possible to also use external identifier services within DSpace.
DOIs are Persistent Identifiers like Handles are, but as many big publishing companies use DOIs they are quite well-known to scientists. Some journals
ask for DOIs to link supplemental material whenever an article is submitted. Beginning with DSpace 4.0 it is possible to use DOIs in parallel to the Handle
System within DSpace. By "using DOIs" we mean automatic generation, reservation and registration of DOIs for every item that enters the repository.
These newly registered DOIs will not be used as a means to build URLs to DSpace items. Items will still rely on handle assignment for the item urls.
DataCite is an international initiative to promote science and research, and a member of the International DOI Foundation. The members of DataCite act
as registration agencies for DOIs. Some DataCite members provide their own APIs to reserve and register DOIs; others let their clients use the DataCite
API directly. Starting with version 4.0 DSpace supports the administration of DOIs by using the DataCite API directly or by using the API from EZID (which
is a service of the University of California Digital Library). This means you can administer DOIs with DSpace if your registration agency allows you to use
the DataCite API directly or if your registration agency is EZID.
To use DOIs within DSpace you have to configure several parts of DSpace:
enter your DOI prefix and the credentials to use the API from DataCite in dspace.cfg,
configure the script which generates some metadata,
activate the DOI mechanism within DSpace,
configure a cron job which transmits the information about new and changed DOIs to the registration agency.
dspace.cfg
After you enter into a contract with a DOI registration agency, they'll provide you with user credentials and a DOI prefix. You have to enter these in the
dspace cfg. Here is a list of DOI configuration options in dspace.cfg:
Configuration [dspace]/config/dspace.cfg
File:
Property: identifier.doi.user
273
Example Value: identifier.doi.user = user123
Informational Username to login into the API of the DOI registration agency. You'll get it from your DOI registration agency.
Note:
Property: identifier.doi.password
Informational Password to login into the API of the DOI registration agency. You'll get it from your DOI registration agency.
Note:
Property: identifier.doi.prefix
Informational The prefix you got from the DOI registration agency. All your DOIs start with this prefix, followed by a slash and a suffix generated
Note: from DSpace. The prefix can be compared with a namespace within the DOI system.
Property: identifier.doi.namespaceseparator
Informational This property is optional. If you want to use the same DOI prefix in several DSpace installations or with other tools that generate and
Note: register DOIs it is necessary to use a namespace separator. All the DOIs that DSpace generates will start with the DOI prefix,
followed by a slash, the namespace separator and some number generated by DSpace. For example, if your prefix is 10.5072 and
you want all DOIs generated by DSpace to look like 10.5072/dspace-1023 you have to set this as in the example value above.
Property: identifier.doi.resolver
Informational URL for the DOI resolver. This will be the stem for generated DOIs. This property is optional, and defaults to the example value
Note: above.
Property: crosswalk.dissemination.DataCite.publisher
Property: crosswalk.dissemination.DataCite.hostingInstitution
Informational The name of the organization/institution which hosts this instance of the object. If not configured, it will default to the value of
Note: crosswalk.dissemination.DataCite.publisher.
Property: crosswalk.dissemination.DataCite.dataManager
Metadata conversion
To reserve or register a DOI, DataCite requires that metadata be supplied which describe the object that the DOI addresses. The file [dspace]/config
/crosswalks/DIM2DataCite.xsl controls the conversion of metadata from the DSpace internal format into the DataCite format. If you are running a version of
DSpace earlier than 6.0, you have to add your DOI prefix, namespace separator and the name of your institution to this file:
274
[dspace]/config/crosswalks/DIM2DataCite.xsl
<!--
Document : DIM2DataCite.xsl
Created on : January 23, 2013, 1:26 PM
Author : pbecker, ffuerste
Description: Converts metadata from DSpace Intermediat Format (DIM) into
metadata following the DataCite Schema for the Publication and
Citation of Research Data, Version 2.2
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dspace="http://www.dspace.org/xmlns/dspace/dim"
xmlns="http://datacite.org/schema/kernel-2.2"
version="1.0">
<!-- DO NOT CHANGE ANYTHING BELOW THIS LINE EXCEPT YOU REALLY KNOW WHAT YOU ARE DOING! -->
...
If you are running DSpace 6.0 or later, then you do not have to change the XSL to change the publisher, datamanager or hostinginstitution. Just change
the configured using the crosswalk.dissemination.DataCite.* properties in local.cfg (see dspace.cfg for examples).
If you want to know more about the DataCite Schema, have a look at the documentation. If you change this file in a way that is not compatible with the
DataCite schema, you won't be able to reserve and register DOIs anymore. Do not change anything if you're not sure what you're doing. To get the XML
on which the XSLT processor will start, use the following command:
To get the XML that will be send to DataCite replace 'dim' with 'DataCite'. If the DOI is not stored in the metadata, DSpace will add it automatically as
identifier. So don't worry if the XML produced by this command does not contain the DOI. Once the DOI is stored in the metadata, it should also be
contained in the XML.
Identifier Service
The Identifier Service manages the generation, reservation and registration of identifiers within DSpace. You can configure it using the config file located in
[dspace]/config/spring/api/identifier-service.xml. In the file you should already find the code to configure DSpace to register DOIs. Just read the comments
and remove the comment signs around the two appropriate beans.
After removing the comment signs the file should look something like this (I removed the comments to make the listing shorter):
275
[dspace]/config/spring/api/identifier-service.xml
<!--
Copyright (c) 2002-2010, DuraSpace. All rights reserved
Licensed under the DuraSpace License.
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">
<bean id="org.dspace.identifier.service.IdentifierService"
class="org.dspace.identifier.IdentifierServiceImpl"
autowire="byType"
scope="singleton"/>
<bean id="org.dspace.identifier.DOIIdentifierProvider"
class="org.dspace.identifier.DOIIdentifierProvider"
scope="singleton">
<property name="configurationService"
ref="org.dspace.services.ConfigurationService" />
<property name="DOIConnector"
ref="org.dspace.identifier.doi.DOIConnector" />
</bean>
<bean id="org.dspace.identifier.doi.DOIConnector"
class="org.dspace.identifier.doi.DataCiteConnector"
scope="singleton">
<property name='DATACITE_SCHEME' value='https'/>
<property name='DATACITE_HOST' value='mds.test.datacite.org'/>
<property name='DATACITE_DOI_PATH' value='/doi/' />
<property name='DATACITE_METADATA_PATH' value='/metadata/' />
<property name='disseminationCrosswalkName' value="DataCite" />
</bean>
</beans>
If you use other IdentifierProviders beside the DOIIdentifierProvider there will be more beans in this file.
Please pay attention to configure the property DATACITE_HOST. Per default it is set to the DataCite test server. To reserve real DOIs you will have to
change it to mds.datacite.org. Ask your registration agency if you're not sure about the correct address.
[dspace]/config/dspace.cfg
event.consumer.doi.class = org.dspace.identifier.doi.DOIConsumer
event.consumer.doi.filters = Item+Modify_Metadata
Then you should add 'doi' to the property event.dispatcher.default.consumers. After adding it, this property may look like this:
[dspace]/config/dspace.cfg
276
If you enabled Item Level Versioning you should enable the VersionedDOIIdentifierProvider instead of the DOIIdentifierProvider. The Vers
ionedDOIIdentifierProvider ensures that newer versions of the same Item gets a DOI looking as the DOI of the first version of and item, extended
by a dot and the version number. With DSpace 6 this also became the default for handles if Item Level Versioning is enabled. In the configuration file [dsp
ace]/config/spring/api/identifier-service.xml you'll find the possiblity to enable the VersionedDOIIdentifierProvider. If you want to
use versioned DOIS, please comment out the DOIIdentifierProvider as only one of both DOIProviders should be enabled at the same time.
The command line interface in general is documented here: Command Line Operations.
The command used for DOIs is 'doi-organiser'. You can use the following options:
-d --delete-all Transmit information to the DOI registration agency about all DOIs that were deleted.
--delete- DOI Transmit information to the DOI registration agency that the specified DOI was deleted. The DOI must already be marked for
doi deletion; you cannot use this command to delete a DOI for an exisiting item.
-l --list List all DOIs whose changes were not committed to the registration agency yet.
-q --quiet The doi-organiser sends error reports to the mail address configured in the property alert.recipient in dspace.cfg. If you use this
option no output should be given to stdout. If you do not use this option the doi-organiser writes information about successful
and unsuccessful operations to stdout and stderr. You can find information in dspace.log of course.
--register- DOI | ItemID If a DOI is marked for registration, you can trigger the registration at the DOI registration agency by this command. Specify
doi | handle either the DOI, the ID of the item, or its handle.
-s --reserve- Transmit to the DOI registration agency information about all DOIs that should be reserved.
all
--reserve- DOI | ItemID If a DOI is marked for registration, you can trigger the registration at the DOI registration agency by this command. Specify
doi | handle either the DOI, the ID of the item, or its handle.
-u --update- If a DOI is reserved for an item, the metadata of the item will be sent to DataCite. This command transmits new metadata for
all items whose metadata were changed since the DOI was reserved.
--update- DOI | ItemID If a DOI needs an update of the metadata of the item it belongs to, you can trigger this update with this command. Specify
doi | handle either the DOI, the ID of the item, or its handle.
Currently you cannot generate new DOIs with this tool. You can only send information about changes in your local DSpace database to the registration
agency.
Update the metadata of all items that have changed since their DOI was reserved.
Reserve all DOIs marked for reservation
Register all DOIs marked for registration
Delete all DOIs marked for deletion
In DSpace, a DOI can have the state "registered", "reserved", "to be reserved", "to be registered", "needs update", "to be deleted", or "deleted". After
updating an item's metadata the state of its assigned DOI is set back to the last state it had before. So, e.g., if a DOI has the state "to be registered" and
the metadata of its item changes, it will be set to the state "needs update". After the update is performed its state is set to "to be registered" again.
Because of this behavior the order of the commands above matters: the update command must be executed before all of the other commands above.
The cron job should perform the following commands with the rights of the user your DSpace installation runs as:
[dspace]/bin/dspace doi-organiser -u -q
[dspace]/bin/dspace doi-organiser -s -q
[dspace]/bin/dspace doi-organiser -r -q
[dspace]/bin/dspace doi-organiser -d -q
277
The doi-organiser sends error messages as email and logs some additional information. The option -q tells DSpace to be quiet. If you don't use this option
the doi-organiser will print messages to stdout about every DOI it successfully reserved, registered, updated or deleted. Using a cron job these messages
would be sent as email.
In case of an error, consult the log messages. If there is an outage of the API of your registration agency, DSpace will not change the state of the DOIs so
that it will do everything necessary when the cron job starts the next time and the API is reachable again.
The frequency the cron job runs depends on your needs and your hardware. The more often you run the cron job the faster your new DOIs will be
available online. If you have a lot of submissions and want the DOIs to be available really quickly, you probably should run the cron job every fifteen
minutes. If there are just one or two submissions per day, it should be enough to run the cron job twice a day.
To set up the cron job, you just need to run the following command as the dspace UNIX user:
crontab -e
The following line tells cron to run the necessary commands twice a day, at 1am and 1pm. Please notice that the line starting with the numbers is one line,
even it it should be shown as multiple lines in your browser.
# Send information about new and changed DOIs to the DOI registration agency:
0 1,13 * * * [dspace]/bin/dspace doi-organiser -u -q ; [dspace]/bin/dspace doi-organiser -s -q ; [dspace]/bin
/dspace doi-organiser -r -q ; [dspace]/bin/dspace doi-organiser -d -q
That means if you want to use other applications or even more than one DSpace installation to register DOIs with the same prefix, you'll have to use a
unique namespace separator for each of them. Also you should not generate DOIs manually with the same prefix and namespace separator you
configured within DSpace. For example, if your prefix is 10.5072 you can configure one DSpace installation to generate DOIs starting with 10.5072/papers-
, a second installation to generate DOIs starting with 10.5072/data- and another application to generate DOIs starting with 10.5072/results-.
DOIs will be used in addition to Handles. This implementation does not replace Handles with DOIs in DSpace. That means that DSpace will still generate
Handles for every item, every collection and every community, and will use those Handles as part of the URL of items, collections and communities.
DSpace currently generates DOIs for items only. There is no support to generate DOIs for Communities and collections yet.
When using DSpace's support for the DataCite API probably not all information would be restored when using the AIP Backup and Restore (see https://gith
ub.com/DSpace/DSpace/issues/5203). The DOIs included in metadata of Items will be restored, but DSpace won't update the metadata of those items at
DataCite anymore. You can even get problems when minting new DOIs after you restored older once using AIP.
In config/dspace.cfg you will find a small block of settings whose names begin with identifier.doi.ezid. You should uncomment these
properties and give them appropriate values. Sample values for a test account are supplied.
name meaning
identifier.doi.ezid.shoulder The "shoulder" is the DOI prefix issued to you by the EZID service. DOIs minted by this instance of DSpace will be
the concatenation of the "shoulder" and a locally unique token.
identifier.doi.ezid.password
identifier.doi.ezid.publisher You may specify a default value for the required datacite.publisher metadatum, for use when the Item has no
publisher.
crosswalk.dissemination. Name of the hosting institution. If not configured, it will be set to the value of crosswalk.dissemination.DataCite.
DataCite.hostingInstitution publisher.
crosswalk.dissemination. Name of the data manager. If not configured, it will be set to the value of crosswalk.dissemination.DataCite.
DataCite.dataManager publisher.
Back in config/spring/api/identifier-service.xml you will see some other configuration of the EZIDIdentiferProvider bean. In most
situations, the default settings should work well. But, here's an explanation of options available:
278
EZID Provider / Registrar settings: By default, the EZIDIdentifierProvider is configured to use the CDLib provider (ezid.cdlib.org) in the EZID_SCHE
ME, EZID_HOST and EZID_PATH settings. In most situations, the default values should work for you. However, you may need to modify these
values (especially the EZID_HOST) if you are registered with a different EZID provider. In that situation, please check with your provider for valid
"host" and "path" settings. If your provider provides EZID service at a particular path on its host, you may set that in EZID_PATH.
NOTE: As of the writing of this documentation, the default CDLib provider settings should also work for institutions that use Purdue (ezid.
lib.purdue.edu) as a provider. Currently, Purdue and CDLib currently share the same infrastructure, and both ezid.cdlib.org and ez
id.lib.purdue.edu point to the same location.
Metadata mappings: You can alter the mapping between DSpace and EZID metadata, should you choose. The crosswalk property is a map
from DSpace metadata fields to EZID fields, and can be extended or changed. The key of each entry is the name of an EZID metadata field;
the value is the name of the corresponding DSpace field, from which the EZID metadata will be populated.
Crosswalking / Transforms: You can also supply transformations to be applied to field values using the crosswalkTransform property. Each k
ey is the name of an EZID metadata field, and its value is the name of a Java class which will convert the value of the corresponding DSpace
field to its EZID form. The only transformation currently provided is one which converts a date to the year of that date, named org.dspace.
identifier.ezid.DateToYear. In the configuration as delivered, it is used to convert the date of issue to the year of publication. You may
create new Java classes with which to supply other transformations, and map them to metadata fields here. If an EZID metadatum is not named
in this map, the default mapping is applied: the string value of the DSpace field is copied verbatim.
Currently, the EZIDIdentifierProvider has a known issue where it stores its DOIs in the dc.identifier field, instead of using the dc.identifier.uri
field (which is the one used by DataCite DOIs and Handles). See https://github.com/DSpace/DSpace/pull/1006 for more details. This will be corrected in a
future version of DSpace.
DSpace currently generates DOIs for items only. There is no support to generate DOIs for Communities and Collections yet.
This feature should ensure that users can see their future DOI, and if necessary, a warning that if certain conditions are not met, the DOI will not be
registered after approval.
Keeping a DOI in pending status does use up an integer from the total DOI namespace, but it also ensures that the submitter, reviewers, administrators etc
know what the DOI will be if it is ever registered in the future.
If this is really not desired, eg. there are many item types which should never get a DOI, then there is a way to configure a filter that avoids minting a new
PENDING DOI at all unless conditions are met in submission.
Propert identifiers.submission.register
y:
Exampl true
e
Value:
279
Inform Enable this feature. Default: false.
ational
Note: Handles will be registered at time of submission.
DOIs (if item filters evaluate to true) will be minted in a "pending" state for items, to be registered or queued for registration at archival.
Propert identifiers.submission.filter.install
y:
Exampl doi-filter
e
Value:
Inform Bean ID of a logical item filter (see config/modules/spring/api/item-filters.xml ) that will be used to evaluate whether a DOI
ational should be queued for registration when this item is installed (archived) in DSpace. This filter will be applied whether or not a "pending" DOI is
Note: already minted for the item.
Propert identifiers.submission.filter.workspace
y:
Exampl always_true_filter
e
Value:
Inform Bean ID of a logical item filter (see config/modules/spring/api/item-filters.xml ) that will be used to evaluate whether a DOI
ational should be minted as "pending" for registration when this item is first submitted as a workspace item in DSpace.
note
Depending on the value of identifiers.submission.strip_pending_during_submission this filter will be checked whenever the
workspace item changes, to see if it now qualifies for a DOI.
Default: always_true_filter
Propert identifiers.submission.strip_pending_during_submission
y:
Exampl true
e
Value:
Inform If, during workspace item changes, the workspace filter no longer evaluates to true, should any DOIs be stripped? (moved to MINTED or
ational DELETED status)
Note:
This is useful in situations where the submitter needs real-time feedback as to whether their item qualifies for a DOI.
Propert identifiers.item-status.register-doi
y:
Exampl false
e
Value:
Inform Allow administrators to queue DOIs for registration in the Item Status page.
ational
Note: Default: false.
Important: This configuration property must be set, even if it matches the default, as it is exposed as a REST configuration property to the
frontend.
Administrator registration
If an item does not have a DOI at all, or if an item has a MINTED or PENDING DOI, a user with ADMIN rights over the item may queue the DOI
registration from the Item Status page. No filters will be applied to this action. This requires identifiers.item-status.register-doi to be true in
identifiers configuration (see above)
280
Item Level Versioning
1 What is Item Level Versioning?
2 Important warnings
3 Disabling Item Level Versioning
4 Initial Requirements
5 User Interface
5.1 General behaviour: Linear Versioning
5.2 Creating a new version of an item
5.3 View the history and older versions of an item
6 Architecture
6.1 Versioning model
7 Configuration
7.1 Versioning Service Override
7.2 Identifier Service Override
7.3 Version History Visibility
7.3.1 Hide Editor/Submitter details in version table
7.4 Allowing submitters to version their items
8 Identified Challenges & Known Issues
8.1 Conceptual compatibility with Embargo
8.2 Conceptual compatibility with Item Level Statistics
Item level versioning was not fully supported in DSpace 7.0 (you were only able to view existing versions). It was restored in DSpace 7.1. See DSpace
Release 7.0 Status
Important warnings
Item Level Versioning on Entities configuration
Configurable Entities are supported in Item Level Versioning support starting from version 7.3. More details about the configuration specific to Configurable
Entities can be found on that page.
AIP Backup & Restore functionality only works with the Latest Version of Items
If you are using the AIP Backup and Restore functionality to backup / restore / migrate DSpace Content, you must be aware that the "Item Level
Versioning" feature is not yet compatible with AIP Backup & Restore. Using them together may result in accidental data loss. Currently the AIPs that
DSpace generates only store the latest version of an Item. Therefore, past versions of Items will always be lost when you perform a restore / replace using
AIP tools. See https://github.com/DSpace/DSpace/issues/4751.
DSpace 6+ changed the way Handles are created for versioned Items
Starting with 6.0, the way DSpace crates Handles for versioned Items was changed. If you want to keep the old behavior of DSpace 4 and 5 you have to
enable the VersionedHandleIdentifierProviderWithCanonicalHandles in the XML configuration files [dspace]/config/spring/api
/identifier-service.xml. See IdentifierServiceOverride below for details and the comments in the configuration file.
versioning.enabled = false
Additionally, you will need to make the following changes to disable all versioning-related features:
Remove the "versioning" consumer from the list of default Event Consumers in either your dspace.cfg or local.cfg. Look for this configuration:
281
# Remove the "versioning" entry in this list
# (NOTE: Your list of consumers may be different based on the features you've enabled)
#event.dispatcher.default.consumers = versioning, discovery, eperson
# For example:
event.dispatcher.default.consumers = discovery, eperson
Once these changes are made, you will need to restart your servlet container (e.g. Tomcat) for the new settings to take effect.
Initial Requirements
The Item Level Versioning implementation builds on following requirements identified by the stakeholders who supported this contribution: Initial
Requirements Analysis
User Interface
A new version can be created started from any available version but will be always be put at the end of the version history (it will be the latest)
Only one in-progress version can exist at any time. When new version has been created and still needs to pass certain steps of the workflow, it is
temporarily impossible to create another new version until the workflow steps are finished and the new version has replaced the previous one.
modules/versioning.cfg
versioning.submitterCanCreateNewVersion=false
282
1. Click "Create a new version" from the buttons on the right side of the item page.
2. Provide the reason for creating a new version that will later on be stored and displayed in the version summary.
3. Your new version is now creates as a new Item in your Workspace. It requires you to go through the submission and workflow steps like you
would do for a normal, new submission to the collection. The rationale behind this is that if you are adding new files or metadata, you will also
need to accept the license for them. In addition to this, the versioning functionality does not bypass any quality control embedded in the workflow
steps.
After the submission steps and the execution of subsequent workflow steps, the new version becomes available in the repository.
Versions can be also managed via the edit item page, in the dedicated versions tab
283
View the history and older versions of an item
An overview of the version history, including links to older versions of an item, is available at the bottom of an Item View page. The repository administrator
can decide whether the version history should be available to all users or restricted to administrators. By default, this information is available to all
users. Information displayed includes the version number, Submitter/Editor name (only if enabled), date, and summary/description. As necessary, you
may change the visibility of this table or the "Editor" column using the "Version History Visibility" configurations below.
Architecture
Versioning model
For every new Version a separate DSpace Item will be created that replicates the metadata, bundle and bitstream records. The bitstream records will point
to the same file on the disk.
284
The Cleanup method has been modified to retain the file if another Bitstream record point to it (the dotted lines in the diagram represent a bitstream
deleted in the new version), in other words the file will be deleted only if the Bitstream record processed is the only one to point to the file (count(INTERNAL
_ID)=1).
Configuration
[dspace_installation_dir]/config/spring/api/versioning-service.xml
In this file, you can specify which metadata fields are automatically "reset" (i.e. cleared out) during the creation of a new item version. By default, all
metadata values (and bitstreams) are copied over to the newly created version, with the exception of dc.date.accessioned and dc.description.
provenance. You may specify additional metadata fields to reset by adding them to the "ignoredMetadataFields" property in the "versioning-service.xml"
file:
285
The canonical handle will always point to the newest version of an Item. This makes sense if you hide the version history. Normal users won't be able to
find older versions and will always see just the newest one. Please keep in mind, that older versions can be accessed by "guessing" the versioned Handle
if you do not remove the read policies manually. The downside of this identifier strategy is that there is no permanent handle to cite the currently newest
version, as it will get a new Handle when a newer version is created.
With DSpace 6 versioned DOIs (using DataCite as DOI registration agency) were added and the default versioned Handle strategy was changed. Starting
with DSpace 6 the VersionedHandleIdentifierProvider creates a handle for the first version of an item. Every newer version gets the same handle
extended by a dot and the version number. To stay by the example from above, the first version of an Item gets the Handle 10673/100, the second version
10673/100.2, the third version 10673.3 and so on. This strategy has the downside that there is no handle pointing always to the newest version. But each
version gets an identifier that can be use to cite exactly this version. If page numbers changes in newer editions the old citations stay valid.
In DSpace 4 and 5 only the strategy using canonical handles (one handle that always points to the newest version) were implemented. In DSpace 6 the
strategy of creating a new handle for each version was implemented. With DSpace 6 this new strategy become the default. The strategy using canonical
handle still exists in DSpace but you have to enable the VersionedHandleIdentifierWithCanonicalHandles in the file [dspace]/config
/spring/api/identifier-service.xml. With DSpace 6 versioned DOIs were introduced using the strategy that every new version gets a new DOI
(extended by a dot and the version numbers for versions >= 2). To use versioned Handle you have to enable DOIs, you have to use DataCite as
registration agency and you have to enable the VersionedDOIIdentifierProvider in the named configuration file.
You can configure which persistent identifiers should be used by editing following XML configuration file, deployed under your dspace installation directory:
[dspace_installation_dir]/config/spring/api/identifier-service.xml
No changes to this file are required to enable Versioning. This file is currently only relevant if you want to keep the identifier strategy from DSpace 4 and 5
or if you want to enable DOIs or even versioned DOIs.
# Setting this to "true" will hide the entire "Version History" table from
# all users *except* Administrators
versioning.item.history.view.admin=false
286
One possible solution would be to present an end user with aggregated statistics across all viewers, and give administrators the possibility to view
statistics per version.
287
Mapping/Linking Items to multiple Collections
Introduction
Using the Item Mapper
Implications
Mapping collection vs Owning collection
Mapping an item does not modify access rights
Introduction
The Item Mapper is a tool in the DSpace web user interface allowing repository managers to display the same item in multiple collections at once. Thanks
to this feature, a repository manager is not forced to duplicate items to display them in different collections
Implications
Mapping collection vs Owning collection
The relation between an item and the collection in which it is mapped is different from the relation that this item has with the collection to which it was
originally submitted. This second collection is referred to as the "owning" collection. When an item is deleted from the owning collection, it automatically
disappears from the mapping collection. From within the mapping collection, the only thing that can be deleted is the mapping relation. Removing this
mapping relation does not affect the presence of the item in the owning collection.
288
Metadata Recommendations
1 Recommended Metadata Fields
2 Local Fields
Title (dc.title)
When submitting an Item via the DSpace web user interface, this field is required.
If you add an Item to DSpace through another means (SWORD, etc), it is recommend to specify a title for an Item. Without a title, the
Item will show up in DSpace a "Untitled".
Publication Date (dc.date.issued)
When submitting an Item via the DSpace web user interface, this field is required (by default).
However, your System Administrator can choose to enable the "Initial Questions" step within the Submission User Interface. En
abling this step will cause the following to occur: If the item is said to be "published", then the Publication Date will be required.
If the item is said to be "unpublished" then the Publication Date will be auto-set to today's date (date of submission). WARNING:
Google Scholar has recommended against automatically assigning this "dc.date.issued" field to the date of submission as it
often results in incorrect dates in Google Scholar results. See https://github.com/DSpace/DSpace/issues/4850 and https://githu
b.com/DSpace/DSpace/issues/5112 for more details.
If you add and Item to DSpace through another means (SWORD, etc), it is recommended to specify the date in which an Item was
published, in ISO-8601 (e.g. 2007, 2008-01, or 2011-03-04). This ensures DSpace can accurately report the publication date to services
like Google Scholar. If an item is unpublished, you can either chose to leave this blank, or pass in the literal string "today" (which will tell
DSpace to automatically set it to the date of ingest)
As of DSpace 4.0, the system will not assign a "dc.date.issued" when unspecified. Previous versions of DSpace (3.0 or below) would set
"dc.date.issued" to the date of accession (dc.date.accessioned), if it was unspecified during ingest.
If you are adding content to DSpace without using the DSpace web user interface, there are two recommended options for assigning "dc.
date.issued"
If the item is previously published before, please set "dc.date.issued" to the date of publication in ISO-8601(e.g. 2007, 2008-01,
or 2011-03-04)
If the item has never been previously published, you may set "dc.date.issued='today'" (the literal string "today"). This will cause
DSpace to automatically assign "dc.date.issued" to the date of accession (dc.date.accessioned), as it did previously
You can also chose to leave "dc.date.issued" as unspecified, but then the new Item will have an empty date within
DSpace.
Obviously, we recommend specifying as much metadata as you can about a new Item. For a full list of supported metadata fields, please see: Metadata
and Bitstream Format Registries
Local Fields
You may encounter situations in which you will require an appropriate place to store information that does not immediately fit with the description of a field
in the default registry. The recommended practice in this situation is to create new fields in a separate schema. You can choose your own name and prefix
for this schema such as local. or myuni.
It is generally discouraged to use any of the fields from the default schema as a place to store information that doesn't correspond with the fields
description. This is especially true if you are ever considering the option to open up your repository metadata for external harvesting.
289
Moving Items
1 Moving Items via Web UI
2 Moving Items via the Batch Metadata Editor
Login as an Administrator
Browse/search for the item.
Click "Edit this Item" on the item page
When editing an item, on the 'Edit item' screen, click the "Move..." button.
Search for the new Collection for the item to appear in. By default, when the item is moved, it will take its authorizations (who can READ / WRITE
it) with it.
If you wish for the item to take on the default authorizations of the destination collection, tick the 'Inherit policies' checkbox. This is useful
if you are moving an item from a private collection to a public collection, or from a public collection to a private collection.
Note: When selecting the 'Inherit policies' option, ensure that this will not override system-managed authorizations such as those
imposed by the embargo system.
290
PDF Citation Cover Page
Enabling PDF Cover Pages may affect your site's visibility in Google Scholar (and similar search engines)
Google Scholar specifically warns against automatically generating PDF Cover Pages, as they can break the metadata extraction techniques used by their
search engine. Be aware that enabling PDF Cover Pages may also cause those items to no longer be indexed by Google Scholar. For more information,
please see the "Indexing Repositories: Pitfalls and Best Practices" talk from Anurag Acharya (co-creator of Google Scholar) presented at the Open
Repositories 2015 conference.
A known issue with the current implementation of the PDF Citation Cover Page is that primarily only English/Roman characters are supported. This is due
to a limitation in the tool used to generate PDFs. See https://github.com/DSpace/DSpace/issues/5590 for more details on this issue
Adding a cover page to retrieved documents from DSpace that include additional citation information has been sought, as documents uploaded to the
repository might have had their context stripped from them, when they are just a PDF. Context that might have surrounded the document would be the
journal, publisher, edition, and more. Without that information, the document might just be a few pages of text, with no way to piece it together. Since
repository policy might be to include this information as metadata to the Item, this metadata can be added to the citation cover page, so that the derivative
PDF includes all of this information.
The citation cover page works by only storing the original PDF in DSpace, and then generating the citation-cover-page PDF on-the-fly. An alternative set
up would be to run the PDF Citation Coverpage Curation Task on the DSpace repository contents, and then disseminate the pre-generated citation-version
instead of generating it on the fly.
As of DSpace 6.0, the configuration file for this feature was renamed from disseminate-citation.cfg to citation-page.cfg. The renaming was
to clarify the purpose of this configuration file, as its previous name was misleading / confusing to some users.
In addition, all configurations below have now been prefixed with "citation-page" (e.g. the enable_globally configuration has been renamed to citati
on-page.enable_globally)
In the {dspace.dir}/config/modules/citation-page.cfg file review the following fields to make sure they are uncommented:
Property: citation-page.enable_globally
Informational Boolean to determine is citation-functionality is enabled globally for entire site. This will enable the citation cover page generator for all
Note: PDFs.
Default: disabled
Property: citation-page.enabled_collections
291
Informational List of collection handles to enable the cover page generator for bitstreams within.
Note:
Default: blank
Property: citation-page.enabled_communities
Informational List of community handles to enable the cover page generator for bitstreams within.
Note:
Default: blank
Property: citation-page.citation_as_first_page
Informational Should the citation page be the first page cover (true), or the last page (false).
Note:
Default: true, (first page)
Property: citation-page.header1
Informational First row of header, perhaps for institution / university name. Commas separate multiple sections of the header (see screenshot above)
Note:
Default Value: DSpace Institution
Property: citation-page.header2
Informational Second row of header, perhaps put your DSpace instance branded name, and url to your DSpace. A comma is used to separate
Note: instance name, and the URL
Property: citation-page.fields
Example citation-page.fields = dc.date.issued, dc.title, dc.creator, dc.contributor.author, dc.publisher, _line_, dc.identifier.citation, dc.identifier.uri
Values:
Informational Metadata fields to display on the citation PDF. Specify in schema.element.qualifier form, and separate fields by a comma. If you want to
Note: have a horizontal line break, use _line_
Property: citation-page.footer
Example citation-page.footer = Downloaded from Scholar Archive at University of Higher Education\, an open access institutional repository. All
Values: Rights Reserved.
Informational Footer text at the bottom of the citation page. It might be some type of license or copyright information, or just letting the recipient know
Note: where they downloaded the file from.
Default Value: Downloaded from DSpace Repository\, DSpace Institution's institutional repository
NOTE: any commas appearing in this text should be escaped as "\,". See example above.
292
Updating Items via Simple Archive Format
1 Item Update Tool
1.1 DSpace Simple Archive Format
1.2 ItemUpdate Commands
1.2.1 CLI Examples
For metadata, ItemUpdate can perform 'add' and 'delete' actions on specified metadata elements. For bitstreams, 'add' and 'delete' are similarly available.
All these actions can be combined in a single batch run.
ItemUpdate supports an undo feature for all actions except bitstream deletion. There is also a test mode, as with ItemImport. However, unlike ItemImport,
there is no resume feature for incomplete processing. There is more extensive logging with a summary statement at the end with counts of successful and
unsuccessful items processed.
One probable scenario for using this tool is where there is an external primary data source for which the DSpace instance is a secondary or down-stream
system. Metadata and/or bitstream content changes in the primary system can be exported to the simple archive format to be used by ItemUpdate to
synchronize the changes.
A note on terminology: item refers to a DSpace item. metadata element refers generally to a qualified or unqualified element in a schema in the form [sch
ema].[element].[qualifier] or [schema].[element] and occasionally in a more specific way to the second part of that form. metadata field
refers to a specific instance pairing a metadata element to a value.
The user is referred to the previous section DSpace Simple Archive Format.
Additionally, the use of a delete_contents is now available. This file lists the bitstreams to be deleted, one bitstream ID per line. Currently, no other
identifiers for bitstreams are usable for this function. This file is an addition to the Archive format specifically for ItemUpdate.
The optional suppress_undo file is a flag to indicate that the 'undo archive' should not be written to disk. This file is usually written by the application in an
undo archive to prevent a recursive undo. This file is an addition to the Archive format specifically for ItemUpdate.
ItemUpdate Commands
Arguments Description
short and
(long) forms:
-a or -- Repeatable for multiple elements. The metadata element should be in the form dc.x or dc.x.y. The mandatory argument indicates the
addmetada metadata fields in the dublin_core.xml file to be added unless already present (multiple fields should be separated by a semicolon ';').
ta However, duplicate fields will not be added to the item metadata without warning or error.
[metadata
element]
-d or -- Repeatable for multiple elements. All metadata fields matching the element will be deleted.
deletemet
adata
[metadata
element]
-A or -- Adds bitstreams listed in the contents file with the bitstream metadata cited there.
addbitstr
eams
293
-D or -- Not repeatable. With no argument, this operation deletes bitstreams listed in the deletes_contents file. Only bitstream IDs are
deletebit recognized identifiers for this operation. The optional filter argument is the classname of an implementation of org.dspace.app.
streams itemdupate.BitstreamFilter class to identify files for deletion or one of the aliases (e.g. ORIGINAL,
[filter ORIGINAL_AND_DERIVATIVES, TEXT, THUMBNAIL) which reference existing filters based on membership in a bundle of that name.
plug In this case, the delete_contents file is not required for any item. The filter properties file will contains properties pertinent to the
classname particular filer used. Multiple filters are not allowed.
or alias]
-i or -- Specifies the metadata field that contains the item's identifier; Default value is "dc.identifier.uri" (Optional)
itemfield
-t or -- Runs the process in test mode with logging. But no changes applied to the DSpace instance. (Optional)
test
-P or -- Prevents any changes to the provenance field to represent changes in the bitstream content resulting from an Add or Delete. In other
provenance words, when this flag is specified, no new provenance information is added to the DSpace Item when adding/deleting a bitstream. No
provenance statements are written for thumbnails or text derivative bitstreams, in keeping with the practice of MediaFilterManager.
(Optional)
-F or -- The filter properties files to be used by the delete bitstreams action (Optional)
filter-
properties
CLI Examples
Adding Metadata:
This will update all DSpace Items listed in your archive directory, adding a new dc.description metadata field. Items will be located in DSpace based
on the handle found in 'dc.identifier.uri' (since the -i argument wasn't used, the default metadata field, dc.identifier.uri, from the dublin_core.xml file in the
archive folder, is used).
294
Managing Community Hierarchy
1 Sub-Community Management
Sub-Community Management
Reindex content for new permissions to take effect
After moving or changing an existing Community hierarchy, it is important to reindex your content. Moving a Community under a new parent may result in
the inheritance of new/different permissions from that new parent Community. These new permissions will not take effect until you reindex your
content. Keep in mind, you may not need to reindex all content, but may be able to simply reindex the content under the new parent Community.
DSpace provides an administrative tool‚ 'CommunityFiliator'‚ for managing community sub-structure. It has two operations, either establishing a community
to sub-community relationship, or dis-establishing an existing relationship.
The familiar parent/child metaphor can be used to explain how it works. Every community in DSpace can be either a 'parent' community‚ meaning it has at
least one sub-community, or a 'child' community‚ meaning it is a sub-community of another community, or both or neither. In these terms, an 'orphan' is a
community that lacks a parent (although it can be a parent); 'orphans' are referred to as 'top-level' communities in the DSpace user-interface, since there is
no parent community 'above' them. The first operation‚ establishing a parent/child relationship - can take place between any community and an orphan.
The second operation - removing a parent/child relationship‚ will make the child an orphan.
where '-s' or '-set' means establish a relationship whereby the community identified by the '-p' parameter becomes the parent of the community identified
by the '-c' parameter. Both the 'parentID' and 'childID' values may be handles or database IDs.
where '-r' or '-remove' means dis-establish the current relationship in which the community identified by 'parentID' is the parent of the community identified
by 'childID'. The outcome will be that the 'childID' community will become an orphan, i.e. a top-level community.
If the required constraints of operation are violated, an error message will appear explaining the problem, and no change will be made. An example in a
removal operation, where the stated child community does not have the stated parent community as its parent: "Error, child community not a child of
parent community".
It is possible to effect arbitrary changes to the community hierarchy by chaining the basic operations together. For example, to move a child community
from one parent to another, simply perform a 'remove' from its current parent (which will leave it an orphan), followed by a 'set' to its new parent.
295
It is important to understand that when any operation is performed, all the sub-structure of the child community follows it. Thus, if a child has itself children
(sub-communities), or collections, they will all move with it to its new 'location' in the community tree.
296
ORCID Integration
Since DSpace 7.3 a bidirectional ORCID integration is available for DSpace. This feature allows for authentication via ORCID, as well as synchronizing
data between ORCID and DSpace, via the usage of Researcher Profiles.
Acknowledgments
The ORCID integration was originally developed by 4Science in DSpace-CRIS. It is the result of years of collaboration with several institutions and the
ORCID team that has helped to correct, improve and broaden the scope of the integration. We want to thank the University of Hong Kong that was the first
institution to fund development activities in this regard back in 2015 and the TUHH Hamburg University of Technology that have funded the initial porting of
the ORCID integration to the new Angular/REST architecture introduced in DSpace 7. Last but not least, funds have been received by the DSpace
community to port this feature from DSpace-CRIS to DSpace.
Overview
User features
Login via ORCID
Connect/Disconnect the local profile to ORCID
Import publications from ORCID
Configuration
Enable the integration
Configure the push of information from DSpace to ORCID
Mapping of the DSpace Person Items to ORCID Works
Mapping of DSpace Publication items to ORCID Works
Mapping of DSpace Project items to ORCID Funding
Configure the import features
Configure the author lookup in submission
Troubleshooting & common issues
I'm having trouble testing the ORCID integration. What should I check?
I cannot find the ORCID features described by this page in my installation
I'm unable to authenticate via ORCID
After logging in via ORCID, a new DSpace account was created instead of using my existing DSpace account
I'm having trouble creating test accounts on ORCID to experiment with the features
I have configured my Public ORCID API credentials in DSpace but I get an error attempting to login via ORCID
I don't find my publications looking up for my ORCID iD
I cannot push all my publications, only few or none of them are listed in the queue
Push of publications to ORCID fails
Push of projects to ORCID fails
Overview
DSpace provides a bidirectional integration with ORCID based on the ORCID API v3.0. Both the Public ORCID API and the Membership API are
supported.
The table below summarizes the supported features according to the type of ORCID API configured.
Authentication
* No credentials: please note that ORCID strongly recommends to apply at least for a free public API Key as this will help to trouble-shoot integration
problems and get support from ORCID. There is also a chance to get better performance/priority over "unknown" client.
User features
297
Connect/Disconnect the local profile to ORCID
The researcher can connect (or disconnect) their DSpace local Researcher Profile with ORCID from the Person item detail page.
298
Once a profile has been connected they can manage their synchronization preferences deciding what should be pushed to ORCID, including:
biographic data
Publication (entities) linked with their Researcher Profile. (Publication entities are synced to Works in ORCID.)
Project (entities) linked with their Researcher Profile. (Project entities are synced to Fundings in ORCID.)
NOTE: The ORCID synchronization feature is disabled by default, even when ORCID Authentication is enabled. See Configuration section below for how
to enable it.
299
The synchronization can happen automatically over the night or manually. The list of information that should be pushed or updated from DSpace to ORCID
is presented in a queue and can be manually discarded or immediately pushed by the researcher.
Configuration
300
# These URLs are for testing against ORCID's Sandbox API
# These are only useful for testing, and you must first request a Sandbox API Key from ORCID
orcid.domain-url= https://sandbox.orcid.org
orcid.api-url = https://api.sandbox.orcid.org/v3.0
orcid.public-url = https://pub.sandbox.orcid.org/v3.0
# Keep in mind, these API keys MUST be for the Sandbox API if you use "sandbox.orcid.org" URLs above!
orcid.application-client-id = <YOUR-SANDBOX-ORCID-CLIENT-ID>
orcid.application-client-secret = <YOUR-SANDBOX-ORCID-CLIENT-SECRET>
# Once you are ready to switch to Production, you need to update these settings to use ORCID's production API
# See https://github.com/ORCID/ORCID-Source/tree/master/orcid-api-web#endpoints
# orcid.domain-url= https://orcid.org
# orcid.api-url = https://api.orcid.org/v3.0
# orcid.public-url = https://pub.orcid.org/v3.0
# DON'T FORGET TO UPDATE YOUR API KEY! It must be a valid Public or Member API Key
# orcid.application-client-id = <YOUR-PRODUCTION-ORCID-CLIENT-ID>
# orcid.application-client-secret = <YOUR-PRODUCTION-ORCID-CLIENT-SECRET>
Enable in Production: To enable the main integration (i.e. connect a local profile with ORCID and push data to the ORCID registry) you MUST to
be an ORCID Member, get a Member API Key and properly enable and configure the feature in DSpace. See also "How do I register for Member
API credentials?" from ORCID.
Enable in Testing: To test ORCID integration, it's possible to use the ORCID Sandbox (without being an ORCID member). However, to do so,
you must request a Sandbox Member API Key. See also "How do I register a public api client?".
Setting the "redirect URL" in ORCID: In the ORCID API Credentials request form you will be asked to enter one or more redirect URLs for your
application (DSpace). You will need to enter here the root URLs of your REST and user interfaces, which could be different. If the root URLs of
both are the same, then just enter the URL of your user interface.
For example, for the DSpace 7 official demo, we use these redirect URLs:
User Interface: https://demo7.dspace.org (please note the absence of a /home or any subpaths)
REST API: https://api7.dspace.org (please note the absence of a /server or any subpaths)
For more information on valid ORCID redirect URLs, see "How do redirect URIs work?" from ORCID.
Configure the Client ID and Client Secret in DSpace: Once ORCID has reviewed and approved your request, you will get from them the Client
ID and Client Secret that need to be set in the local.cfg among other properties See the configuration examples above.
Please note that by default DSpace will request permissions to READ and WRITE all the information from the ORCID profile, as this will enable support for
all of the features. You can fine-tune that by overriding the following properties. Please note that if you are going to configure Public API Credentials you
MUST update this configuration keeping only the /authenticate scope as all the other scopes require Member API.
# The scopes to be granted by the user during the login on ORCID (see https://info.orcid.org/faq/what-is-an-
oauth-scope-and-which-scopes-does-orcid-support/)
orcid.scope = /authenticate
# The below scopes are ONLY valid if you have a Member API Key. They should be commented out if you only have a
Public API Key
orcid.scope = /read-limited
orcid.scope = /activities/update
orcid.scope = /person/update
To enable ORCID Authentication you need to uncomment the following line in the modules/authentication.cfg file or add it to your local.cfg
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.OrcidAuthentication
Please note that you are NOT required to enable the ORCID Authentication to use the other ORCID features, including the synchronisation ones. It is also
possible to use just the ORCID Authentication without enabling all the other features.
When a user loggs in via ORCID the system will attempt to reuse an existing account looking up by email. If none is found then a new account is created in
DSpace. It is possible to disable the creation of new accounts by setting the following property:
authentication-orcid.can-self-register = false
To enable ORCID Synchronization, you need to uncomment the following or add it to your local.cfg
301
# the properties below are required only for the sync / linking part (not for authentication or import)
orcid.synchronization-enabled = true
# you need to enable the orcidqueue consumer to keep track of what need to be sync between DSpace and ORCID
event.dispatcher.default.consumers = versioning, discovery, eperson, orcidqueue
The push of DSpace data (Person, Publication, Project) to ORCID is based on mappings defined in the config/modules/orcid.cfg file. You will find
details below in the dedicated paragraphs.
The ORCID Synchronization features depend on other features that must be enabled: DSpace User Profile, Configurable Entities at least Person,
Publication, Project & OrgUnit.
The synchronization features are classified as experimental at the time of 7.3 and it MUST be enabled manually. Due to the strict validation rules applied
on the ORCID side and the absence of friendly edit UI for the archived items in DSpace (see issues#2876), it is hard at this time to achieve an optimal UX.
Funding Agency
(Organisation)
Currency if an Amount is
provided
Country
To provide more meaningful messages to the user DSpace implements a local validation before trying to push the record to ORCID. This validation verifies
the data using the rules above so that a specific message is displayed to the user. If for any reason another error is returned by ORCID a generic message
is shown to the user and the exact technical message received by ORCID is logged in the dspace.log file and stored in the orcidhistory table.
The local validation can be turned off. This validation is mainly intended as a development/debug option and should be not enabled in production.
orcid.validation.work.enabled = true
orcid.validation.funding.enabled = true
302
<!-- Configuration of ORCID profile sections factory.
Each entry of sectionFactories must be an implementation of OrcidProfileSectionFactory.-->
<bean class="org.dspace.orcid.service.impl.OrcidProfileSectionFactoryServiceImpl">
<constructor-arg name="sectionFactories">
<list>
<bean class="org.dspace.orcid.model.factory.impl.OrcidSimpleValueObjectFactory">
<constructor-arg name="sectionType" value="OTHER_NAMES" />
<constructor-arg name="preference" value="BIOGRAPHICAL" />
<property name="metadataFields" value="${orcid.mapping.other-names}" />
</bean>
<bean class="org.dspace.orcid.model.factory.impl.OrcidSimpleValueObjectFactory">
<constructor-arg name="sectionType" value="KEYWORDS" />
<constructor-arg name="preference" value="BIOGRAPHICAL" />
<property name="metadataFields" value="${orcid.mapping.keywords}" />
</bean>
<bean class="org.dspace.orcid.model.factory.impl.OrcidSimpleValueObjectFactory">
<constructor-arg name="sectionType" value="COUNTRY" />
<constructor-arg name="preference" value="BIOGRAPHICAL" />
<property name="metadataFields" value="${orcid.mapping.country}" />
</bean>
<bean class="org.dspace.orcid.model.factory.impl.
OrcidPersonExternalIdentifierFactory">
<constructor-arg name="sectionType" value="EXTERNAL_IDS" />
<constructor-arg name="preference" value="IDENTIFIERS" />
<property name="externalIds" value="${orcid.mapping.person-external-
ids}" />
</bean>
<bean class="org.dspace.orcid.model.factory.impl.OrcidSimpleValueObjectFactory">
<constructor-arg name="sectionType" value="RESEARCHER_URLS" />
<constructor-arg name="preference" value="IDENTIFIERS" />
<property name="metadataFields" value="${orcid.mapping.researcher-
urls}" />
</bean>
</list>
</constructor-arg>
</bean>
The above configuration links each piece of information that can be synchronized from DSpace to ORCID with a preference that the user can manage on
the DSpace side (i.e. sync of the keywords is linked to the BIOGRAPHICAL preference) and defines which DSpace metadata will be used to fill the ORCID
field. The bean reads the metadata mapping from the config/modules/orcid.cfg
303
A DSpace "Publication" item is pushed to ORCID as a "Work" using the org.dspace.orcid.model.factory.impl.OrcidWorkFactory configured
via this mapping bean:
In the above configuration the "simple" properties are mapped matching the ORCID field name on the left (i.e. short-description) with the DSpace metadata
that holds such information (i.e dc.description.abstract). For the ORCID Type field a special "converter" is configured so that the value of the DSpace
metadata (i.e. dc.type) is mapped to the controlled-list of types accepted by ORCID (https://info.orcid.org/faq/what-work-types-does-orcid-support/). The
value of the orcid.mapping.work.type.converter matches the name of a bean defined in the config/spring/api/orcid-services.xml
Finally a special treatment is needed for the external ids as this is a complex field on the ORCID side composed of two values: the identifier type (from a
controlled-list) and the identifier value. In this case the configuration maps a DSpace metadata field (i.e. dc.identifier.doi) to a specific identifier type (i.e.
the part after :: , doi).
304
<bean id="orcidFundingFactoryFieldMapping" class="org.dspace.orcid.model.OrcidFundingFieldMapping" >
<property name="contributorFields" value="${orcid.mapping.funding.contributors}" />
<property name="externalIdentifierFields" value="${orcid.mapping.funding.external-ids}" />
<property name="titleField" value="${orcid.mapping.funding.title}" />
<property name="typeField" value="${orcid.mapping.funding.type}" />
<property name="typeConverter" ref="${orcid.mapping.funding.type.converter}" />
<property name="amountField" value="${orcid.mapping.funding.amount}" />
<property name="amountCurrencyField" value="${orcid.mapping.funding.amount.currency}" />
<property name="amountCurrencyConverter" ref="${orcid.mapping.funding.amount.currency.
converter}" />
<property name="descriptionField" value="${orcid.mapping.funding.description}" />
<property name="startDateField" value="${orcid.mapping.funding.start-date}" />
<property name="endDateField" value="${orcid.mapping.funding.end-date}" />
<property name="organizationRelationshipType" value="${orcid.mapping.funding.organization-
relationship-type}" />
</bean>
in the above configuration the "simple" properties are mapped matching the ORCID field name on the left (i.e. description) with the DSpace metadatafield
that holds such information (i.e dc.description). For the ORCID Type field a special "converter" is configured so that the value of the DSpace metadata (i.e.
dc.type) is mapped to the controlled-list of types accepted by ORCID (https://support.orcid.org/hc/en-us/articles/360006894674-Metadata-in-the-Funding-
section). The value of the orcid.mapping.funding.type.converter matches the name of a bean defined in the config/spring/api/orcid-
services.xml The same apply for the currency orcid.mapping.funding.amount.currency.converter =
mapConverterDSpaceToOrcidAmountCurrency
Finally a special treatment is needed for Funder that is a mandatory field on the ORCID side. In this case the mapping defines which relation is used to link
the Project with the Funder (OrgUnit)
orcid.mapping.funding.organization-relationship-type = isOrgUnitOfProject
305
The Import features from ORCID have been implemented using the Live Import Framework
The following bean is used to configure the import of person records from ORCID. It is activated as an external source in config/spring/api
/external-services.xml
the mapping between ORCID Person and the DSpace Person Item is the following, currently hard-coded:
ORCID DSpace
Name/FamilyName person.firstName
Name/GivenName person.givenName
Name/Path person.identifier.orcid
The following bean is instead used to configure the import of publication records from ORCID (Work)
The mapping of the ORCID Work metadata to the DSpace metadata is performed by the following bean in config/spring/api/orcid-services.xml
306
<bean id="orcidPublicationDataProviderFieldMapping" class="org.dspace.orcid.model.
OrcidWorkFieldMapping" >
<property name="contributorFields" value="${orcid.external-data.mapping.publication.
contributors}" />
<property name="externalIdentifierFields" value="${orcid.external-data.mapping.publication.
external-ids}" />
<property name="publicationDateField" value="${orcid.external-data.mapping.publication.issued-
date}" />
<property name="titleField" value="${orcid.external-data.mapping.publication.title}" />
<property name="journalTitleField" value="${orcid.external-data.mapping.publication.is-part-
of}" />
<property name="shortDescriptionField" value="${orcid.external-data.mapping.publication.
description}" />
<property name="languageField" value="${orcid.external-data.mapping.publication.language}" />
<property name="languageConverter" ref="${orcid.external-data.mapping.publication.language.
converter}" />
<property name="typeField" value="${orcid.external-data.mapping.publication.type}" />
<property name="typeConverter" ref="${orcid.external-data.mapping.publication.type.converter}"
/>
</bean>
Please note that such mapping is separated from the mapping used to push information from DSpace to ORCID but usually, as provided in the default
configuration, the mapping should be the same.
orcid.external-data.mapping.publication.description = dc.description.abstract
orcid.external-data.mapping.publication.issued-date = dc.date.issued
orcid.external-data.mapping.publication.language = dc.language.iso
orcid.external-data.mapping.publication.language.converter = mapConverterOrcidToDSpaceLanguageCode
orcid.external-data.mapping.publication.is-part-of = dc.relation.ispartof
orcid.external-data.mapping.publication.type = dc.type
orcid.external-data.mapping.publication.type.converter = mapConverterOrcidToDSpacePublicationType
via an ORCID lookup authority available for a DSpace repository that is not using Configurable Entities
via a relation among the research output item (Publication, etc.) and a Person Item bound to the ORCID Person External Source defined in the
previous paragraph
I'm having trouble testing the ORCID integration. What should I check?
307
Please double check the documentation and the other FAQs to be sure that you have followed all of the instructions to enable the integration correctly. If
you still have trouble, contact the DSpace tech community via email or slack providing as much detail as possible. If the issue is related to the
synchronization of DSpace local data with ORCID it would be useful to share information about the content of your orcidhistory table and any relevant
message that you could have in the dspace.log file.
If you are encountering this issue, you'll see a message like this in your "dspace.log" file on the backend:
After logging in via ORCID, a new DSpace account was created instead of using my existing DSpace
account
Currently, the ORCID integration with DSpace relies on a matching email address to find your existing account. If your ORCID account and DSpace
account have different email addresses associated with them, then it is possible that a new (duplicative) user account will be created.
I'm having trouble creating test accounts on ORCID to experiment with the features
Please refer to the ORCID trouble-shooting guide https://info.orcid.org/documentation/integration-guide/troubleshooting/ A frequent mistake working with
the ORCID sandbox environment is to forget that only email addresses @mailinator.com are allowed for account created on the sandbox. Remember to
validate your email address once the account as been created visiting the online inbox at mailinator.com
I have configured my Public ORCID API credentials in DSpace but I get an error attempting to login via
ORCID
When you use public ORCID API credentials you can only use a subset of the integration features (check). Moreover you need to limit the scopes
(permissions) requested to the user via the ORCID authentication to the /authenticate scope. Please check the Enable the integration section
above.
I cannot push all my publications, only few or none of them are listed in the queue
Please double check that the orcidqueue consumer has been enabled (in dspace.cfg or local.cfg) and that the orcid settings of your profile have the "All
publications" checkbox flagged. ORCID features require the use of the new Configurable Entities. Only Publication item are synchronized with ORCID;
simple "untyped" Items will not be synchronized. Please consider to convert your legacy collection to "Publication" collection and set a dspace.entity.
type = Publication metadata on your legacy items.
308
309
Researcher Profiles
The DSpace Researcher Profile feature has been introduced in DSpace 7.3 to support the ORCID Integration work but can be used alone. It is turned off
by default and must be enabled manually
A DSpace Researcher Profile is a special Person Entity (item) that is linked with exactly one EPerson (DSpace account). This linked EPerson owns the
profile (Person Item), including having WRITE permission on it. The link between the Person Item and the EPerson is managed in the Person's dspace.
object.owner metadata field. This field is configured to hold authority values and will contain the UUID of the EPerson that owns the profile.
Profiles require Configurable Entities to be enabled, as every Researcher Profile is represented by a Person Entity.
A profile can be linked to only one EPerson (user account). That EPerson has full rights to manage the profile, including whether the profile is
publicly visible or private.
Optionally, Profiles can be synchronized (or initially created) via ORCID Integration.
When the feature is enabled, the user can create a researcher profile from his Profile (account page)
If a Person Item already exists in the system, matching the account email address, this Person Item is offered to the user:
Once a profile has been created or claimed, the user can make it public (Anonymous READ) or private:
310
By default, deletion of the researcher profile does NOT delete the corresponding Person Item. Instead, it just unlinks the Person Item from the EPerson
account. This behavior can be changed as specified in the Advanced configuration section below.
researcher-profile.entity-type = Person
You can specify a different Entity Type for the item that can be used as profile. This is an advanced setting -- change it only if you know what are you doing
and have implemented specific customisation.
You need to enable also the EPerson authority for the dspace.object.owner. Uncomment the following lines in the config/modules/authority.
cfg:
Last, you need to ensure that at least one Collection is configured to accept Person entities. Only EPersons having the submission right in such a
Collection will be able to create profiles. There are many possibilities for using these settings to control who may or may not create a profile.
Advanced configuration
You can configure some aspects of the Profile feature in the config/modules/researcher-profile.cfg
Pro researcher-profile.entity-type
per
ty:
Ex Person
am
ple
Val
ue:
Inf The type of Entity to use for Researcher Profile items. By default, the Person Entity is used, as this is provided out-of-the-box in DSpace. This
or would only need to be modified if you have created a heavily customized Configurable Entities data model which does NOT include Person.
ma
tio
nal
Not
e:
Pro researcher-profile.collection.uuid
per
ty:
311
Ex [collection-uuid]
am
ple
Val
ue:
Inf UUID of the Collection where all Researcher Profiles should be created by default. This Collection MUST be configured to accept Person Entities
or (or the entity type specified in "researcher-profile.entity-type").
ma
tio By default this is UNSPECIFIED. The default behavior is that the person's Researcher Profile will be created in the Collection in DSpace which is
nal configured to accept Person Entities and where the user has permissions to submit. If multiple Collections of this type are available, then the first
Not one found will be used.
e:
Pro researcher-profile.hard-delete.enabled
per
ty:
Ex false
am
ple
Val
ue:
Inf Whether to enable "hard delete" when a Researcher Profile is deleted by an EPerson. When "hard delete" is enabled (set to true), then anytime
or an EPerson deletes their Researcher Profile, the underlying Person Entity will be deleted (i.e. this acts as a permanent deletion). When "hard
ma delete" is disabled (set to false, the default value), then anytime an EPerson deletes their Researcher Profile, it will simply be "unlinked". In other
tio words, the underlying Person Entity will be kept in the system.
nal
note
Pro researcher-profile.set-new-profile-visible
per
ty:
Ex false
am
ple
Val
ue:
Inf Whether to make a new Researcher Profile "visible" (i.e. allow anonymous access) on creation. When set to "false" (default value), a newly
or created Researcher Profile will only be accessible to the EPerson who created it. That EPerson may chose to make to visible (i.e. allow
ma anonymous access) at a later time. When set to "true", a newly created Researcher Profile will be immediately accessible to anonymous
tio users. But, the EPerson who created it may chose to hide it (i.e. disallow anonymous access) at a later time.
nal
note
Troubleshooting
The users sees an error when they try to create their profile
The feature requires that the Person entity be configured in the data model (see Configurable Entities) and the user must have permission to submit in at
least one collection configured to accept Person entities. Please double check that the EPersonAuthority is bound to the dspace.object.owner metada
ta -- see the Enable the feature section above.
312
Statistics and Metrics
SOLR Statistics
DSpace Google Analytics Statistics
Exchange usage statistics with IRUS
313
SOLR Statistics
DSpace uses the Apache SOLR application underlying the statistics. SOLR enables performant searching and adding to vast amounts of (usage) data.
Unlike previous versions, enabling statistics in DSpace does not require additional installation or customization. All the necessary software is included.
In addition to the already existing logging of pageviews and downloads, DSpace also logs search queries users enter in the DSpace search dialog and
workflow events.
DSpace 7.x and 8.x does not yet support all features
In DSpace 7.x & 8.x, only usage statistics (pageview, downloads) are logged. Search statistics and workflow reports (which were available in 6.x and
below) are not yet supported. See their related tickets: https://github.com/DSpace/DSpace/issues/2880 and https://github.com/DSpace/DSpace/issues
/2851
Workflow Events logging
Only workflow events, initiated and executed by a physical user are being logged. Automated workflow steps or ingest procedures are currently not being
logged by the workflow events logger.
The logging happens at the server side, and doesn't require a javascript like Google Analytics does, to provide usage data. Definition of which fields are to
be stored happens in the file dspace/solr/statistics/conf/schema.xml.
Although they are stored in the same index, the stored fields for views, search queries and workflow events are different. A new field, statistics_type
determines which kind of a usage event you are dealing with. The three possible values for this field are view, search and workflow.
314
<field name="type" type="integer" indexed="true" stored="true" required="true" />
<field name="id" type="integer" indexed="true" stored="true" required="true" />
<field name="ip" type="string" indexed="true" stored="true" required="false" />
<field name="time" type="date" indexed="true" stored="true" required="true" />
<field name="epersonid" type="integer" indexed="true" stored="true" required="false" />
<field name="continent" type="string" indexed="true" stored="true" required="false"/>
<field name="country" type="string" indexed="true" stored="true" required="false"/>
<field name="countryCode" type="string" indexed="true" stored="true" required="false"/>
<field name="city" type="string" indexed="true" stored="true" required="false"/>
<field name="longitude" type="float" indexed="true" stored="true" required="false"/>
<field name="latitude" type="float" indexed="true" stored="true" required="false"/>
<field name="owningComm" type="integer" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="owningColl" type="integer" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="owningItem" type="integer" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="dns" type="string" indexed="true" stored="true" required="false"/>
<field name="userAgent" type="string" indexed="true" stored="true" required="false"/>
<field name="isBot" type="boolean" indexed="true" stored="true" required="false"/>
<field name="referrer" type="string" indexed="true" stored="true" required="false"/>
<field name="uid" type="uuid" indexed="true" stored="true" default="NEW" />
<field name="statistics_type" type="string" indexed="true" stored="true" required="true" default="view" />
The combination of type and id determines which resource (either community, collection, item page or file download) has been requested.
Disabling statistical tracking currently must be done by modifying the backend's Spring configuration in [dspace]/config/spring/rest/event-service-listeners.
xml. In that file, you must comment out the "SolrLoggerUsageEventListener" in order to disable all tracking
315
event-service-listeners.xml
<beans>
...
<!-- Comment out this bean, as shown below, to disable all tracking of usage statistics in Solr -->
<!-- Inject the SolrLoggerUsageEventListener into the EventService -->
<!--
<bean class="org.dspace.statistics.SolrLoggerUsageEventListener">
<property name="eventService" ref="org.dspace.services.EventService"/>
</bean>
-->
</beans>
After commenting out that bean, you will need to restart Tomcat.
NOTE: This only disables tracking statistics in Solr. The "Statistics" link will still appear in the header menu of the User Interface. However, you can limit
its visibility by setting it to only be visible to administrative users. Update this configuration in your local.cfg or user-statistics.cfg:
At this time, there is no flag to remove the "Statistics" menu link completely. See https://github.com/DSpace/DSpace/issues/9698
If you are not seeing the menu, it's likely that they are only enabled for administrators in your installation. Change the configuration parameter
"authorization.admin.usage" in usage-statistics.cfg to false in order to make statistics visible for all repository visitors.
Home page
Starting from the repository homepage, the statistics page displays the top 10 most popular items of the entire repository.
316
Search Query Statistics
Only supported in DSpace 6 and below
Search query statistics are only supported in DSpace 6.x and below at this time. The below screenshots and instructions are for 6.x and will need updating
if this feature is ported to later versions of DSpace. See https://github.com/DSpace/DSpace/issues/2880
In the UI, search query statistics can be accessed from the lower end of the navigation menu.
If you are not seeing the link labelled "search statistics", it is likely that they are only enabled for administrators in your installation. Change the
configuration parameter "authorization.admin.search" in usage-statistics.cfg to false in order to make statistics visible for all repository visitors.
The dropdown on top of the page allows you to modify the time frame for the displayed statistics.
The Pageviews/Search column tracks the amount of pages visited after a particular search term. Therefor a zero in this column means that after executing
a search for a specific keyword, not a single user has clicked a single result in the list.
If you are using Discovery, note that clicking the facets also counts as a search, because clicking a facet sends a search query to the Discovery index.
Workflow Event statistics are only supported in DSpace 6.x and below at this time. The below screenshots and instructions are for 6.x and will need
updating if this feature is ported to later versions of DSpace. See https://github.com/DSpace/DSpace/issues/2851
In the UI, search query statistics can be accessed from the lower end of the navigation menu.
If you are not seeing the link labelled "Workflow statistics", it is likely that they are only enabled for administrators in your installation. Change the
configuration parameter "authorization.admin.workflow" in usage-statistics.cfg to false in order to make statistics visible for all repository visitors.
The dropdown on top of the page allows you to modify the time frame for the displayed statistics.
317
Architecture
The DSpace Statistics Implementation is a Client/Server architecture based on Solr for collecting usage events in the User Interface or REST API
applications of DSpace. Solr must be installed separately from DSpace.
Property: solr-statistics.server
Informati Is used by the SolrLogger Client class to connect to the Solr server over http and perform updates and queries. In most cases, this can (and
onal should) be set to localhost (or 127.0.0.1).
Note:
To determine the correct path, you can use a tool like wget to see where Solr is responding on your server. For example, you'd want to
send a query to Solr like the following:
wget http://127.0.0.1/solr/statistics/select?q=*:*
Assuming you get an HTTP 200 OK response, then you should set solr.log.server to the '/statistics' URL of 'http://127.0.0.1/solr
/statistics' (essentially removing the "/select?q=:" query off the end of the responding URL.)
Property: solr-statistics.query.filter.bundles
Example solr-statistics.query.filter.bundles=ORIGINAL
Value:
Informati A comma seperated list that contains the bundles for which the file statistics will be displayed.
onal
Note:
Property: solr-statistics.query.filter.spiderIp
318
Informati If true, statistics queries will filter out spider IPs -- use with caution, as this often results in extremely long query strings.
onal
Note:
Property: solr-statistics.query.filter.isBot
Informati If true, statistics queries will filter out events flagged with the "isBot" field. This is the recommended method of filtering spiders from statistics.
onal
Note:
Property: solr-statistics.autoCommit
Informati If true (default), then all view statistics will be committed to Solr whenever the next autoCommit is triggered. This is recommended behavior.
onal If false, then view statistics will be committed to Solr immediately (i.e. via an explicit commit call). This setting is untested in Production
Note: scenarios, and is primarily used by automated integration tests (to verify that the statistics engine is working properly).
Property: solr-statistics.spiderips.urls
Example solr-statistics.spiderips.urls =
Value:
http://iplists.com/google.txt, \
http://iplists.com/inktomi.txt, \
http://iplists.com/lycos.txt, \
http://iplists.com/infoseek.txt, \
http://iplists.com/altavista.txt, \
http://iplists.com/excite.txt, \
http://iplists.com/misc.txt
Informati List of URLs to download spiders files into [dspace]/config/spiders. These files contain lists of known spider IPs and are utilized by the
onal SolrLogger to flag usage events with an "isBot" field, or ignore them entirely.
Note:
The "stats-util" command can be used to force an update of spider files, regenerate "isBot" fields on indexed events, and delete spiders from
the index. For usage, run:
dspace stats-util -h
In the {dspace.dir}/config/modules/usage-statistics.cfg file review the following fields. These fields can be edited in place, or overridden in
your own local.cfg config file (see Configuration Reference).
Prope usage-statistics.dbfile
rty:
Infor References the location of the installed GeoLite or DB-IP City "mmdb" database file. This file is utilized by the LocationUtils to calculate the
matio location of client requests based on IP address.
nal
Note: NOTE: This database file MUST be downloaded, installed and updated using third-party tools. See the "Managing the City Database File"
section below.
Prope usage-statistics.resolver.timeout
rty:
319
Exam usage-statistics.resolver.timeout = 200
ple
Value:
Infor Timeout in milliseconds for DNS resolution of origin hosts/IPs. Setting this value too high may result in solr exhausting your connection pool.
matio
nal
Note:
Infor Will cause Statistics logging to look for X-Forward URI to detect clients IP that have accessed it through a Proxy service (e.g. the Apache
matio mod_proxy). Allows detection of client IP when accessing DSpace. [Note: This setting is found in the DSpace Logging section of dspace.cfg]
nal
Note:
Prope usage-statistics.authorization.admin.usage
rty:
Infor When set to true, only general administrators, collection and community administrators are able to access the pageview and download statistics
matio from the web user interface. As a result, the links to access statistics are hidden for non logged-in admin users. Setting this property to "false"
nal will display the links to access statistics to anyone, making them publicly available.
Note:
Prope usage-statistics.authorization.admin.search
rty:
Infor When set to true, only system, collection or community administrators are able to access statistics on search queries.
matio
nal
Note:
Prope usage-statistics.authorization.admin.workflow
rty:
Infor When set to true, only system, collection or community administrators are able to access statistics on workflow events.
matio
nal
Note:
Prope usage-statistics.logBots
rty:
Infor When this property is set to false, and IP is detected as a spider, the event is not logged.
matio When this property is set to true, the event will be logged with the "isBot" field set to true.
nal (see solr-statistics.query.filter.* for query filter options)
Note:
320
Prope usage-statistics.shardedByYear
rty:
Infor When set to "true", the DSpace statistics engine will look for additional Solr Shards (per year) when compiling all usage statistics. Therefore, if
matio you are regularly running "stats-utils -s" (as documented in the "Solr Sharding By Year" section of the "SOLR Statistics Maintenance" page),
nal then you should set this to "true".
Note: By default, it is "false", which tells the statistics engine to only compile usage statistics based on what is found in the current Solr core.
Search query statistics are only supported in DSpace 6.x and below at this time. See https://github.com/DSpace/DSpace/issues/2852
Older versions of DSpace featured static reports generated from the log files. They still persist in DSpace today but are completely independent from the
SOLR based statistics.
The following configuration parameters applicable to these reports can be found in dspace.cfg.
# should the stats be publicly available? should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = false
These fields are not used by the new 1.6 Statistics, but are only related to the Statistics from previous DSpace releases
Statistics Administration
Anonymizing Statistics
DSpace provides a commandline script (./dspace anonymize-statistics) which allows you to anonymize your statistics to better comply with GDPR and
similar privacy regulations.
The script will anonymise the IP values by rewriting (‘masking’) the last part. This mask is configurable, both for ipv4 and ipv6 addresses.
For IPv4 addresses, the last number will be replaced by the mask, defined by the configuration key ‘anonymise_statistics.ip_v4_mask’ which
defaults to ‘254’.
For example, 109.74.16.171 is rewritten as 109.74.16.254
For IPv6 address, the last two numbers will be replaced by the mask, defined by the configuration key ‘anonymise_statistics.ip_v6_mask’ which
defaults to ‘FFFF:FFFF’. For example, 2001:0db8:85a3:0000:0000:8a2e:0370:7334 is rewritten as 2001:0db8:85a3:0000:0000:8a2e:FFFF:FFFF
For each anonymised record, the DNS field is also replaced by “anonymised”.
The program only processes records older than 90 days. This period can be altered with the config ‘anonymise_statistics.time_limit’ (expressed in
days) in usage-statistics.cfg.
"-s [sleep]" : The script takes an optional parameter ‘-s [sleep]’ (expressed in ms), which will make the Java thread sleep between the calls to Solr
to reduce the load impact.
"-t [threads]" : The Solr service commit mechanism is also optimised by adding multi-threading support. The script takes an optional parameter ‘-t
[threads]’ to indicate how many threads the Solr service can use for this, if not given the thread count defaults to 2.
Statistical records can also be anonymised the moment they are created. Enabling this feature can be done by setting the configuration parameter
"anonymise_statistics.anonymise_on_log" to true in "usage-statististics.cfg" When this configuration property is not set, the feature is disabled by default.
321
Custom Reporting - Querying SOLR Directly
When the web user interface does not offer you the statistics you need, you can greatly expand the reports by querying the SOLR index directly.
Resources
https://www.safaribooksonline.com/library/view/apache-solr-enterprise/9781782161363/
https://lucidworks.com/blog/faceted-search-with-solr/
Examples
http://localhost:8983/solr/statistics/select?indent=on&version=2.2&start=0&rows=10&fl=*%
2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0
Explained:
<lst name="facet_counts">
<lst name="facet_fields">
<lst name="epersonid">
<int name="66">1167</int>
<int name="117">251</int>
<int name="52">42</int>
<int name="19">36</int>
<int name="88">20</int>
<int name="112">18</int>
<int name="110">9</int>
<int name="96">0</int>
</lst>
</lst>
</lst>
Either install a copy of MaxMind's GeoLite City database (in MMDB format)
Installing MaxMind GeoLite2 is free. However, you must sign up for a (free) MaxMind account in order to obtain a license key to use the
GeoLite2 database.
You will need to arrange regular downloads of the GeoLite2 database. MaxMind offers an updater tool (geoipupdate) to do the
downloading/updating, and a number of Linux distributions package it (as geoipupdate). You will still need to configure your license
key prior to usage. Use it before restarting DSpace, to get an up-to-date database.
Once the "GeoLite2-City.mmdb" database file is installed on your system, you will need to configure its location as the value of usage-
statistics.dbfile in your local.cfg configuration file.
NOTE: This file is frequently updated by MaxMind.com, so you will need to refresh it regularly (ideally by scheduling the updater tool via
a cron job or similar). As this is written, the database is updated monthly, and to be allowed to obtain it you need to agree to keep your
copy updated.
Or, you can alternatively use/install DB-IP's City Lite database (in MMDB format)
This database is also free to use, but does not require an account to download.
You will need to arrange regular downloads of the City Lite database. DB-IP offers an updater tool (dbip-update) to do the downloading
/updating, but it requires PHP to run.
Once the "dbip-city-lite.mmdb" database file is installed on your system, you will need to configure its location as the value of usage-
statistics.dbfile in your local.cfg configuration file.
322
NOTE: This file is frequently updated by DB-IP.com, so you will need to refresh it regularly (ideally by scheduling the updater tool via a
cron job or similar). As this is written, the database is updated monthly with the latest available at https://db-ip.com/db/download/ip-to-city-
lite
323
SOLR Statistics Maintenance
1 DSpace Log Converter
2 Filtering and Pruning Spiders
3 Export SOLR records to intermediate format for import into another tool/instance
4 Export SOLR statistics, for backup and moving to another server
5 Import SOLR statistics, for restoring lost data or moving to another server
6 Reindex SOLR statistics, for upgrades or whenever the Solr schema for statistics is changed
7 Upgrade Legacy DSpace Object Identifiers (pre-6x statistics) to DSpace 6x UUID Identifiers
8 Solr Sharding By Year
8.1 Technical implementation details
8.2 Testing Solr Shards
The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted into Solr.
-m or -- Adds a wildcard at the end of input and output, so it would mean if -i dspace.log -m was specified, dspace.log* would be
multiple converted. (i.e. all of the following: dspace.log, dspace.log.1, dspace.log.2, dspace.log.3, etc.)
-n or -- If the log files have been created with DSpace 1.6 or newer
newformat
-h or --help Help
The command loads the intermediate log files that have been created by the aforementioned script into Solr.
Arguments Description
(short and long
forms):
-m or -- Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported
multiple
-s or -- To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the information about the host from its
skipdns IP address, such as geographical location, etc. This can be slow, and wouldn't work on a server not connected to the internet.)
-l or --local For developers: allows you to import a log file from another system, so because the handles won't exist, it looks up random items in
your local system to add hits to instead.
-h or --help Help
Although the DSpace Log Convertor applies basic spider filtering (googlebot, yahoo slurp, msnbot), it is far from complete. Please refer to Filtering and
Pruning Spiders for spider removal operations, after converting your old logs.
324
Command used: [dspace]/bin/dspace stats-util
-r or --remove- While indexing the bundle names remove the statistics about deleted bitstreams
deleted-bitstreams
-u or --update- Update Spider IP Files from internet into [dspace]/config/spiders. Downloads Spider files identified in dspace.cfg
spider-files under property solr.spiderips.urls. See Configuration settings for Statistics
-f or --delete- Delete Spiders in Solr By isBot Flag. Will prune out all records that have isBot:true
spiders-by-flag
-i or --delete- Delete Spiders in Solr By IP Address, DNS name, or Agent name. Will prune out all records that match spider identification
spiders-by-ip patterns.
-m or --mark-spiders Update isBot Flag in Solr. Marks any records currently stored in statistics that have IP addresses matched in spiders files
Notes:
The usage of these options is open for the user to choose. If you want to keep spider entries in your repository, you can just mark them using "-m" and
they will be excluded from statistics queries when "solr.statistics.query.filter.isBot = true" in the dspace.cfg. If you want to keep the
spiders out of the solr repository, just use the "-i" option and they will be removed immediately.
Spider IPs are specified in files containing one pattern per line. A line may be a comment (starting with "#" in column 1), empty, or a single IP address or
DNS name. If a name is given, it will be resolved to an address. Unresolvable names are discarded and will be noted in the log.
There are guards in place to control what can be defined as an IP range for a bot. In [dspace]/config/spiders, spider IP address ranges have to be
at least 3 subnet sections in length 123.123.123 and IP Ranges can only be on the smallest subnet [123.123.123.0 - 123.123.123.255]. If not, loading that
row will cause exceptions in the dspace logs and exclude that IP entry.
Spiders may also be excluded by DNS name or Agent header value. Place one or more files of patterns in the directories [dspace]/config/spiders
/domains and/or [dspace]/config/spiders/agents. Each line in a pattern file should be either empty, a comment starting with "#" in column 1, or
a regular expression which matches some names to be recognized as spiders.
Export SOLR records to intermediate format for import into another tool/instance
Command used: [dspace]/bin/dspace stats-util
-e or --export Export SOLR view statistics data to usage statistics intermediate format
This exports the records to [dspace]/temp/usagestats_0.csv. This will chunk the files at 10,000 records to new files. This can be imported with sta
ts-log-importer to SOLR Statistics
- i or - -index- optional, the name of the index to process. "statistics" is the default. "authority" can also be exported.
name
-d or -- optional, directory to use for storing the exported files. By default, [dspace]/solr-export is used. If that is not appropriate (due
directory to storage concerns), we recommend you use this option to specify a more appropriate location.
325
- f or - -force- optional, overwrite export file if it exists (DSpace 6.1 and later)
overwrite
Import SOLR statistics, for restoring lost data or moving to another server
Command used: [dspace]/bin/dspace solr-import-statistics
- i or - -index- optional, the name of the index to process. "statistics" is the default. "authority" can also be imported.
name
-c or --clear optional, clears the contents of the existing stats core before importing
-d or -- optional, directory which contains the files for importing. By default, [dspace]/solr-export is used. If that is not appropriate
directory (due to storage concerns), we recommend you use this option to specify a more appropriate location.
Reindex SOLR statistics, for upgrades or whenever the Solr schema for statistics is changed
Comman [dspace]/bin/dspace solr-reindex-statistics
d used:
Java org.dspace.util.SolrImportExport
class:
Argumen Description
ts (short
and long
forms):
-k or optional, tells the script to keep the intermediate export files for possible later use (by default all exported files are removed at the end of the
--keep reindex process).
-d or optional, directory to use for storing the exported files (temporarily, unless you also specify --keep, see above). By default, [dspace]
-- /solr-export is used. If that is not appropriate (due to storage concerns), we recommend you use this option to specify a more
directo appropriate location. Not sure about your space requirements? You can estimate the space required by looking at the current size of [dspac
ry e]/solr/statistics
NOTE: solr-reindex-statistics is safe to run on a live site. The script stores incoming usage data in a temporary SOLR core, and then merges that
new data into the reindexed data when the reindex process completes.
Upgrade Legacy DSpace Object Identifiers (pre-6x statistics) to DSpace 6x UUID Identifiers
This command was introduced in DSpace 7.0 and will be included in the DSpace 6.4 release as well.
It is recommended that all DSpace instances with legacy identifiers perform this one-time upgrade of legacy statistics records.
This action is safe to run on a live site. As a precaution, it is recommended that you backup you statistics shards before performing this action.
Note: a link to this section of the documentation should be added to the DSpace 6.4 Release Notes. (It is already noted in the DSpace 7.0 Upgrading
DSpace page, step 11d)
The DSpace 6x code base changed the primary key for all DSpace objects from an integer id to UUID identifiers. Statistics records that were created
before upgrading to DSpace 6x contain the legacy identifiers.
While the DSpace user interfaces make some attempt to correlate legacy identifiers with uuid identifiers, it is recommended that users perform this one
time upgrade of legacy statistics records.
If you have sharded your statistics repository, this action must be performed on each shard.
326
Arguments (short and long forms): Description
- i or - -index-name Optional, the name of the index to process. "statistics" is the default
NOTE: This process will rewrite most solr statistics records and may temporarily double the size of your statistics repositories.
If a UUID value cannot be found for a legacy id, the legacy id will be converted to the form "xxxx-unmigrated" where xxxx is the legacy id.
-s or --shard-solr-index Splits the data in the main Solr core up into a separate core for each year. This will upgrade the performance of
Solr.
Notes:
Yearly Solr sharding is a routine that can drastically improve the performance of your DSpace SOLR statistics. It was introduced in DSpace 3.0 and is not
backwards compatible. The routine decreases the load created by the logging of new usage events by reducing the size of the SOLR Core in which new
usage data are being logged. By running the script, you effectively split your current SOLR core, containing all of your usage events, into different SOLR
cores that each contain the data for one year. In case your DSpace has been logging usage events for less than one year, you will see no notable
performance improvements until you run the script after the start of a new year. Both writing new usage events as well as read operations should be more
performant over several smaller SOLR Shards instead of one monolithic one.
It is highly recommended that you execute this script once at the start of every year. To ensure this is not forgotten, you can include it in your crontab or
other system scheduling software. Here's an example cron entry (just replace [dspace] with the full path of your DSpace installation):
# At 12:00AM on January 1, "shard" the DSpace Statistics Solr index. Ensures each year has its own Solr index
- this improves performance.
0 0 1 1 * [dspace]/bin/dspace stats-util -s
After running the statistics shard process, the "View Usage Statistics" page(s) in DSpace will not automatically recognize the new shard.
Restart tomcat to ensure that the new shard is recognized & included in usage statistics queries.
327
Repair of Shards Created Before DSpace 5.7 or DSpace 6.1
If you ran the shard process before upgrading to DSpace 5.7 or DSpace 6.1, the multi-value fields such as owningComm and onwningColl are likely be
corrupted. Previous versions of the shard process lost the multi-valued nature of these fields. Without the multi-valued nature of these fields, it is difficult to
query for statistics records by community / collection / bundle.
You can verify this problem in the solr admin console by looking at the owningComm field on existing records and looking for the presence of "\\," within
that field.
for file in *
do
sed -E -e "s/[\\]+,/,/g" -i $file
done
5. For each shard that was exported, run the following import
If you repeat the query that was run previously, the fields containing "\\," should now contain an array of owning community ids.
Shard Naming
Prior to the release of DSpace 6.1, the shard names created were off by one year in timezones with a positive offset from GMT.
Shards created subsequent to this release may appear to skip by one year.
See Unable to locate Jira server for this macro. It may be due to Application Link configuration.
The actual sharding of the of the original Solr core into individual cores by year is done in the shardSolrIndex method in the org.dspace.statistics.
SolrLogger class. The sharding is done by first running a facet on the time to get the facets split by year. Once we have our years from our logs we query
the main Solr data server for all information on each year & download these as CSVs. When we have all data for one year, we upload it to the newly
created core of that year by using the update csv handler. Once all data of one year have been uploaded, those data are removed from the main Solr (by
doing it this way if our Solr crashes we do not need to start from scratch).
A bug exists in the DSpace 6.0 release that prevents tomcat from starting when multiple shards are present.
To address this issue, the initialization of SOLR shards is deferred until the first SOLR related requests are processed.
See Unable to locate Jira server for this macro. It may be due to Application Link configuration.
328
Testing Solr Shards
These notes detail how to test and manipulate SOLR statistics shards.
Note that the multi-value field is corrupted if you import by this manner.
329
It is possible to csv import parameters using curl.
Note that existing shards use the statistics directory as an "instance" directory.
330
The new shard can be queried like the other ones
331
DSpace Google Analytics Statistics
Google Analytics Support
Enabling Google Analytics
Configuring Google Analytics
Google Analytics Reports in DSpace UI
Configuration settings for Google Analytics Statistics
When Google Analytics is disabled, you will see 404 responses returned from the REST API whenever the User Interface attempts to access ${dspace.
server.url}/api/config/properties/google.analytics.key . This is expected behavior, as that 404 response is the REST API telling the
User Interface that Google Analytics is not configured. When the UI sees that 404 from the REST API, it disables Google Analytics tracking the UI.
Prope google.analytics.buffer.limit
rty:
Inform Maximum number of events held in the buffer to send to Google Analytics. Used in conjunction with "cron" settings below.
ationa
l Note:
Prope google.analytics.cron
rty:
Inform REQUIRED if you want to send file download events to Google Analytics (where they will be tracked as Google "events"). This defines the
ationa schedule for how frequently events tracked on the backend (like file downloads) will be sent to Google Analytics. Syntax is defined at https://ww
l Note: w.quartz-scheduler.org/api/2.3.0/org/quartz/CronTrigger.html
The above example will run this task every 5 minutes (0 0/5 * * * ?)
For Google Analytics 4, you MUST also add the "api-secret" below to support sending download events.
Prope google.analytics.api-secret
rty:
332
Inform (Only used for Google Analytics 4) Defines a Measurement Protocol API Secret to be used to track interactions which occur outside of the
ationa user's browser.
l Note: This is REQUIRED to track downloads of bitstreams. For more details see https://developers.google.com/analytics/devguides/collection/protocol
/ga4
Steps to create your API secret are also available from https://www.monsterinsights.com/docs/how-to-create-your-measurement-protocol-api-
secret-in-ga4/
Prope google-analytics.bundles
rty:
Inform Which Bundles to include in Bitstream statistics. By default, set to ORIGINAL bundle only.
ationa
l Note:
Google Analytics Reporting is not available in DSpace 7.0. While DSpace 7 can capture statistics via Google Analytics (see above), it is not able to display
Google Analytics reports in the DSpace User Interface (like was supported in the XMLUI). It is under discussion as it's unclear how many sites used this
feature. See DSpace Release 7.0 Status
As of DSpace version 5.0 it has also become possible to expose that recorded Google Analytics data within DSpace. The data is retrieved from Google
using the Google Analytics Reporting API v3. This feature is disabled by default, to enable it please follow the instructions below.
1. Logon to the Google Developers Console https://console.developers.google.com/project with whatever email address you use to access/manage
your existing Google Analytics account(s).
2. Create a new Google Project. The assumption is that you are developing some new software and will make use of the Google code repository.
This is not the case but you need to create the skeleton project before you can proceed to the next step.
3. Enable the Analytics API for the project. In the sidebar on the left, expand APIs & auth. Next, click APIs. In the list of APIs, make sure the status
is ON for the Analytics API.
4. In the sidebar on the left, select Credentials.
5. Select OAuth / Create new Client ID, then in the subsequent popup screen select Service account. This will automatically generate the
required Service Account email address and certificate.
6. Go to your Google Analytics dashboard http://www.google.com/analytics/. Create an account for the newly generated Service Account email
address and give it permission to 'Read and Analyze' at account level. See *Note below.
7. The generated certificate needs to be placed somewhere that your DSpace application can access and be referenced as described below in the
configuration section..
*Note:- The Google documentation specifies that the Service Account email address should only require 'Read and Analyze' permission. However, it would
appear this may not be the case and it may be necessary to grant greater permissions, at least initially.
Prop google-analytics.application.name
erty:
Infor Not sure if this property is required but it was in the example code provided by Google. Please do not delete.
matio
nal
Note:
Prop google-analytics.table.id
erty:
Exa ga:12345678
mple
Valu
e:
333
Infor Logon to the Google Analytics Dashboard and select the Property (or website in plain English) that you wish to target. Then select the Admin
matio section for the property. You should then be able to select the 'view settings' for the view you are interested in. The View ID should replace
nal 12345678 below (note - confusingly the Reporting API documentation refers to the View ID as Table ID).
Note:
Prop google-analytics.account.email
erty:
Exa xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx@developer.gserviceaccount.com
mple
Valu
e:
Infor The email address automatically generated when you created the Service Account.
matio
nal
Note:
Prop google-analytics.certificate.location
erty:
Exa /home/example/dslweb--privatekey.p12
mple
Valu
e:
Infor The certificate file automatically generated when you created the Service Account.
matio
nal
Note:
Prop google-analytics.authorization.admin.usage
erty:
Exa true
mple
Valu
e:
Infor Control if the statistics pages should be only shown to authorized users. If enabled, only the administrators for the DSpaceObject will be able to
matio view the statistics. If disabled, anyone with READ permissions on the DSpaceObject will be able to view the statistics.
nal
Note:
334
Exchange usage statistics with IRUS
1 Introduction
2 Prerequisite
3 Configuration
4 Re-trying failed attempts
Introduction
IRUS (Institutional Repository Usage Statistics) enables Institutional Repositories to share and expose statistics based on the COUNTER standard.
It offers opportunities for benchmarking and acts as an intermediary between repositories and other agencies.
Prerequisite
The DSpace server should be able to access the tracker’s base production and test URL's.
The tracker's base production URL will depend on the area/country where your repository is located:
https://irus.jisc.ac.uk/counter/test/
Access to the tracker's base URLs can easily be verified using a wget command with the applicable URL, e.g.:
wget https://irus.jisc.ac.uk/counter/test/
Configuration
The IRUS statistics tracker can be configured in the irus-statistics.cfg file which can be found [dspace-src]/dspace/config/modules.
irus.statistics. Configuration used to enable the IRUS statistics tracker. Set to true to enable. false
tracker.enabled
irus.statistics. Metadata field to check if certain items should be excluded from tracking. If empty or commented out, all items
tracker.type-field are tracked.
irus.statistics. The values in the above metadata field that will be considered to be tracked.
tracker.type-value
irus.statistics. The entity types to be included in the tracking. If left empty, only publication hits will be tracked. If entities are Publica
tracker.entity- disabled in DSpace (the default in DSpace 7.1), then all Items will be included in tracking. tion
types
irus.statistics. The tracker environment determines to which url the statistics are exported (test or prod). test
tracker.
environment
irus.statistics. The url to which the trackings are exported when testing. (In theory, this should be https://irus.jisc.ac.uk/counter
tracker.testurl /test/)
irus.statistics. The url to which the trackings are exported in production. (this will depend on your area/country, refer to the
tracker.produrl Prerequisite section)
335
irus.statistics. External URL pointing to the COUNTER user agents file. The user agents file is downloaded from the provided
spider.agentregex. URL as part of the Apache ant build process.
url
Item views determined by DSpace to have been generated by bots/spiders are not sent to IRUS. Including this
additional (and optional) agents file can reduce unnecessary network traffic by reducing the need to transfer
view data that will be ignored by IRUS.
irus.statistics. Location where the user agents file should be downloaded to. The Apache ant build process that retrieves the
spider.agentregex. user agents file from the URL specified above places it in the location specified here.
regexfile
Example value: ${dspace.dir}/config/spiders/agents/COUNTER_Robots_list.txt
[deployed-dspace]/bin/dspace retry-tracker
This will iterate over all the logged entries and retry committing them. If they fail again, they remain in the table, if they succeed, they are removed.
It is strongly advised to schedule this script to be executed daily or weekly (preferable at low load-times during the night or weekend). If there are no failed
entries, the script will not perform any actions and exit immediately.
336
User Interface
User Interface Configuration
User Interface Customization
User Interface Debugging
Accessibility
Browse
Discovery
Contextual Help Tooltips
IIIF Configuration
Multilingual Support
337
User Interface Configuration
Overview
Configuration File Format
Migrate environment file to YAML
Configuration Override
In 7.2 or above
In 7.1 or 7.0
Configuration Reference
Production Mode
UI Core Settings
Server Side Rendering (SSR) Settings
REST API Settings
Cache Settings - General
Cache Settings - Server Side Rendering (SSR)
Authentication Settings
Form Settings
Notification Settings
Submission Settings
Language Settings
Browse By Settings
Community-List Settings
Homepage Settings
Undo Settings
Item Access Labels
Item Page Settings
Theme Settings
Media Viewer Settings
Uploading video captioning files
Toggle end-user agreement and privacy policy
Settings for rendering Markdown, HTML and MathJax in metadata
Controlled Vocabularies in Search Filters
Universal (Server-side Rendering) Settings
Debug Settings
Overview
As the DSpace 7 User Interface is built on Angular.io, it aligns with many of the best practices of that platform & the surrounding community. One example
is that our UI uses the TypeScript language. That said, you do NOT need to be deeply familiar with TypeScript to edit themes or other configuration.
In DSpace 7.2 and later, the UI Configuration format changed to support runtime configuration loading
As of DSpace 7.2, the UI configuration format has changed to YAML in order to support runtime configuration. This means that reloading configurations
now simply requires restarting the UI (which generally takes a few seconds).
In DSpace 7.1 and below, you had to rebuild the UI anytime you modified a configuration setting. The UI configuration format was Typescript which
required recompiling each time a setting changed.
In DSpace 7.1 and 7.0, the Configuration format was a Typescript file and was located at ./src/environments/environment.*.ts. The structure of
this file was essentially a JSON like format.
If you are upgrading from 7.0 or 7.1 to 7.2 (or later), you will either need to migrate your old configuration file (from Typescript to YAML) or start fresh. You
can migrate your old (7.0 or 7.1) "environment.*.ts" configuration file to the new "config.*.yml" format (see below).
1. First, you will need to make minor modifications to the old "environment.*.ts" configuration file(s) to ensure it no longer imports any other files:
338
Replace all imports in environment.*.ts
// (1) FIRST, you must comment out or remove all 4 imports at the top of the file. For example:
interface GlobalConfig { }
enum NotificationAnimationsType {
Fade = 'fade',
FromTop = 'fromTop',
FromRight = 'fromRight',
FromBottom = 'fromBottom',
FromLeft = 'fromLeft',
Rotate = 'rotate',
Scale = 'scale'
}
enum BrowseByType {
Title = 'title',
Metadata = 'metadata',
Date = 'date'
}
enum RestRequestMethod {
GET = 'GET',
POST = 'POST',
PUT = 'PUT',
DELETE = 'DELETE',
OPTIONS = 'OPTIONS',
HEAD = 'HEAD',
PATCH = 'PATCH'
}
2. Now, you are ready to run the "yarn env:yaml" command to transform this old configuration into the new format.
# For example, from the 7.2 (or above) root directory, run this:
# yarn env:yaml relative/path/to/old/environment.prod.ts config/config.prod.yml
3. Finally, you should replace the old environment.*.ts config file(s) with the stock versions. They continue to provide default configuration values,
but customization should be done in the YAML files. If you had created additional environment files, those can be deleted.
Configuration Override
In 7.2 or above
Starting in 7.2, if you make a configuration update, you only need to restart the frontend. There is no need to rebuild unless you have made code changes
in "./src" directory or similar.
The UI configuration files reside in the ./config/ folder in the Angular UI source code. The default configuration is provided in config.yml.
To change the default configuration values, you simply create (one or more) local files that override the parameters you need to modify. You can use conf
ig.example.yml as a starting point.
For example, create a new config.dev.yml file in config/ for a development environment;
339
For example, create a new config.prod.yml file in config/ for a production environment;
Using Environment variables. All environment variables MUST (1) be prefixed with "DSPACE_", (2) use underscores as separators (no dots
allowed), and (3) use all uppercase. Some examples are below:
# Other examples
defaultLanguage => DSPACE_DEFAULTLANGUAGE
mediaViewer.video => DSPACE_MEDIAVIEWER_VIDEO
Or, by creating a .env (environment) file in the project root directory and setting the environment variables in that location.
The override priority ordering is as follows (with items listed at the top overriding all other settings)
1. Environment variables
2. The .env file
3. The ./config/config.prod.yml, ./config/config.dev.yml or ./config/config.test.yml files (depending on current mode)
4. The ./config/config.yml file
5. The hardcoded defaults in ./src/config/default-app-config.ts
In 7.1 or 7.0
The UI configuration files reside in the ./src/environments/ folder in the Angular UI source code. The default configuration are in environment.
common.ts in that directory.
To change the default configuration values, you simply create (one or more) local files that override the parameters you need to modify. You can use envi
ronment.template.ts as a starting point.
For example, create a new environment.dev.ts file in src/environments/ for a development environment;
For example, create a new environment.prod.ts file in src/environments/ for a production environment;
The "ui" and "rest" sections of the configuration may also be overridden separately via one of the following
340
# "ui" settings environment variables
DSPACE_HOST # The host name of the angular application
DSPACE_PORT # The port number of the angular application
DSPACE_NAMESPACE # The namespace of the angular application
DSPACE_SSL # Whether the angular application uses SSL [true/false]
Or, by creating a .env (environment) file in the project root directory and setting the environment variables in that location.
The override priority ordering is as follows (with items listed at the top overriding all other settings)
1. Environment variables
2. The ".env" file
3. The "environment.prod.ts", "environment.dev.ts" or "environment.test.ts"
4. The "environment.common.ts"
Configuration Reference
The following configurations are available in ./src/environments/environment.common.ts These settings may be overridden as described above
Production Mode
Only valid for 7.1 or 7.0
As of 7.2 and above, Angular production mode is automatically enabled whenever you are running the app in Production mode (NODE_ENV=production,
or 'yarn start:prod' or 'yarn serve:ssr'). Angular production mode is automatically disabled when you are running the app in Development mode
(NODE_ENV=development, or 'yarn start:dev" or 'yarn serve')
When Production mode is enabled, this enables Angular's runtime production mode and compresses the built application. This should always be enabled
in Production scenarios.
production: true
UI Core Settings
The "ui" (user interface) section defines where you want Node.js to run/respond. It may correspond to your primary/public URL, but it also may not (if you
are running behind a proxy). In this example, we are setting up our UI to just use localhost, port 4000. This is a common setup for when you want to use
Apache or Nginx to handle HTTPS and proxy requests to Node.js running on port 4000.
ui:
ssl: false
host: localhost
port: 4000
# NOTE: Space is capitalized because 'namespace' is a reserved string in TypeScript
nameSpace: /
# The rateLimiter settings limit each IP to a 'max' of 500 requests per 'windowMs' (1 minute).
rateLimiter:
windowMs: 60000 # 1 minute
max: 500 # limit each IP to 500 requests per windowMs
341
Format for 7.1 or 7.0 (environment.*.ts)
ui: {
ssl: false,
host: 'localhost',
port: 4000,
// NOTE: Space is capitalized because 'namespace' is a reserved string in TypeScript
nameSpace: '/',
// The rateLimiter settings limit each IP to a "max" of 500 requests per "windowMs" (1 minute).
rateLimiter: {
windowMs: 1 * 60 * 1000, // 1 minute
max: 500 // limit each IP to 500 requests per windowMs
}
},
The "rateLimiter" sub-section can be used to protect against a DOS (denial of service) attack when the UI is processed on the server side (i.e. server-side
rendering). Default settings are usually OK. In Angular, server-side rendering occurs to support better Search Engine Optimization (SEO), as well as to
support clients which cannot use Javascript. See the next section for more details.
Sub-path in frontend URL: When using a subpath (nameSpace) in your UI server base URL (e.g. "http://localhost:4000/mysite/" instead of "http://localhost:
4000/"), you must make sure that the URL without the subpath is added to the rest.cors.allowed-origins list in [dspace]/config/modules
/rest.cfg or the local.cfg override. The default value used for this configuration assumes that Origin and DSpace URL are identical, but CORS
origins do not contain a subpath. Without this change you will see CORS policy errors preventing communication between the frontend and backend
servers.
342
config.*.yml
universal:
# (7.6.2 and later) Whether to tell Angular to inline "critical" styles into the server-side rendered HTML.
# Determining which styles are critical is a relatively expensive operation; this option is
# disabled (false) by default to boost server performance at the expense of loading smoothness.
inlineCriticalCss: false
# (7.6.3 and later) Path prefixes to enable SSR for. By default these are limited to paths of primary DSpace
objects listed in the DSpace sitemap.
# Paths are matched based on whether they "start with" a string in this configuration. Wildcards are not
supported.
# To disable this feature, specify [ '/' ], as that will result in all paths being enabled for SSR.
paths: [ '/home', '/items/', '/entities/', '/collections/', '/communities/', '/bitstream/', '/bitstreams/', '
/handle/' ]
# (7.6.3 and later) Whether to enable rendering of Search component in SSR.
# If set to true the component will be included in the HTML returned from the server side rendering.
# If set to false the component will not be included in the HTML returned from the server side rendering.
enableSearchComponent: false
# (7.6.3 and later) Whether to enable rendering of Browse component on SSR.
# If set to true the component will be included in the HTML returned from the server side rendering.
# If set to false the component will not be included in the HTML returned from the server side rendering.
enableBrowseComponent: false
# (7.6.3 and later) Enable state transfer from the server-side application to the client-side application.
(Defaults to true)
# Note: When using an external application cache layer, it's recommended not to transfer the state to avoid
caching it.
# Disabling it ensures that dynamic state information is not inadvertently cached, which can improve security
and
# ensure that users always use the most up-to-date state.
transferState: true
# (7.6.3 and later) When a different REST base URL is used for the server-side application, the generated
state contains references to
# REST resources with the internal URL configured. By default, these internal URLs are replaced with public
URLs.
# Disable this setting to avoid URL replacement during SSR. In this the state is not transferred to avoid
security issues.
replaceRestUrl: true
The "paths" setting defines which DSpace pages will be processed in server-side rendering. For proper Search Engine Optimization, you should ensure
this "paths" setting includes all URL paths that you want to be indexed by search engine bots. Any paths not listed in this setting will be inaccessible to
most bots/crawlers (at least any that cannot process Javascript). The default "paths" settings are listed above, and they correspond to every page which is
listed in the Sitemaps or may be of interest to search engines. DSpace purposefully does not add "/search" or "/browse" pages to these paths because bot
requests to those pages may result in performance issues or require a larger amount of CPU to process the SSR. Keep in mind that values of "paths" are
matched against the start of a path, so "/items/" will match every Item page in the system.
The "enableSearchComponent" and "enableBrowseComponent" settings control whether the Angular Components related to search/browse are enabled
for SSR or not. These components are used in many different pages within the DSpace site, and these configurations are obeyed site wide. For example,
while the "/communities/" and "/collections/" pages do undergo SSR by default, both of these pages also include embedded browse options as separate
tabs. Setting "enableBrowseComponent: false" will ensure that the browse tabs for all Communities and Collections are not processed by SSR. Similarly,
some "/entities/" pages (e.g. Person Entities) may include an embedded search section on the entity page. Setting "enableSearchComponent: false" will
ensure that embedded search section for all Entities is not processed by SSR.
The "transferState" and "replaceRestUrl" settings are ONLY used if you've set the "ssrBaseUrl" setting the REST API Settings below (see next
section). When the "ssrBaseUrl" is specified, then a different REST API URL is being used for all SSR requests. Setting "transferState" to true (default
vaule) will ensure that the state of the server-side application will be transferred to the client-side after SSR completes. This is recommended behavior
unless you are using an external cache layer. Setting "replaceRestUrl" to true (default value) will ensure that all URLs returned in the JSON response of
an SSR request to the "ssrBaseUrl" will be updated to use the public REST API URL (set by the "ssl", "host", "port" and "nameSpace" in the REST API
Settings). This is recommended behavior to ensure the client-side application has access to the public URLs you can use. Setting "replaceRestUrl" to
false will disable this URL replacement, but will also disable "transferState" to ensure the state is not transferred while including potentially "private" URLs.
This example is valid if your Backend is publicly available at https://mydspace.edu/server/ . Keep in mind that the "port" must always be specified even if
it's a standard port (i.e. port 80 for HTTP and port 443 for HTTPS).
343
Format for 7.2 or later (config.*.yml)
rest:
ssl: true
host: mydspace.edu
port: 443
# NOTE: Space is capitalized because 'namespace' is a reserved string in TypeScript
nameSpace: /server
# (7.6.3 and later) OPTIONAL: Provide a different REST API URL to be used during SSR execution.
# It must contain the whole URL including protocol, server port and server namespace
ssrBaseUrl: http://localhost:8080/server
rest: {
ssl: true,
host: 'api.mydspace.edu',
port: 443,
// NOTE: Space is capitalized because 'namespace' is a reserved string in TypeScript
nameSpace: '/server'
},
The "ssl", "host", "port" and "nameSpace" settings are all required , and are used to construct the URL used to contact the REST API. As noted above,
these four settings should be kept in sync with the value of dspace.server.url in the backend's local.cfg
The "ssrBaseUrl" provides the option to use a different URL for the REST API when performing Server Side Rendering (SSR) (see prior section). This is
not enabled by default, but it can provide potential SSR performance benefits, as it allows all SSR requests to bypass DNS lookup. For example, the
primary REST API settings (ssl, host, port, nameSpace) should always reference a public URL like "https://mydspace.edu/server". But, you could set the
"ssrBaseUrl" to a localhost URL (e.g. "http://localhost:8080/server") if your REST API is running on the same machine as the User Interface. This would
result in all client-side code (running in the user's browser) accessing the REST API via the public URL, while the server-side code (triggered by SSR)
would access the REST API via the localhost URL. A few important tips:
The "ssrBaseUrl" need not always be a localhost URL, but it must be accessible to the machine where the User Interface's server-side code is
running.
When the "ssrBaseUrl" is specified, you must also set the corresponding dspace.server.ssr.url in the backend's local.cfg. See Configurati
on Reference.
When the "ssrBaseUrl" is specified, there are additional "ssr" settings available including "transferState" and "replaceRestUrl". See Server Side
Rendering (SSR) above.
cache:
# NOTE: how long should objects be cached for by default
msToLive:
default: 900000 # 15 minutes
# Default 'Cache-Control' HTTP Header to set for all static content (including compiled *.js files)
# Defaults to one week. This lets a user's browser know that it can cache these files for one week,
# after which they will be "stale" and need to be redownloaded.
control: max-age=604800 # one week
autoSync:
defaultTime: 0
maxBufferSize: 100
timePerMethod:
PATCH: 3 # time in seconds
344
Format for 7.1 or 7.0 (environment.*.ts)
cache: {
// NOTE: how long should objects be cached for by default
msToLive: {
default: 15 * 60 * 1000, // 15 minutes
},
control: 'max-age=60', // revalidate browser
autoSync: {
defaultTime: 0,
maxBufferSize: 100,
timePerMethod: {[RestRequestMethod.PATCH]: 3} as any // time in seconds
}
},
Caching options are also available for the User Interface's "server-side rendering" (which uses Angular Universal). Server-side rendering is used to pre-
generate full HTML pages and pass those back to users. This is necessary for Search Engine Optimization (SEO) as some web crawlers cannot use
Javascript. It also can be used to immediately show the first HTML page to users while the Javascript app loads in the user's browser.
While server-side-rendering is highly recommended on all sites, it can result in Node.js having to pre-generate many HTML pages at once when a site has
a large number of simultaneous users/bots. This may cause Node.js to spend a lot of time processing server-side-rendered content, slowing down the
entire site.
Therefore, DSpace provides some basic caching of server-side rendered pages, which allows the same pre-generated HTML to be sent to many users
/bots at once & decreases the frequency of server-side rendering.
Two cache options are provide: botCache and anonymousCache . As the names suggest, the botCache is used for known web crawlers / bots, while
the anonymousCache may be used for all anonymous (non-authenticated) users. By default, only the botCache is enabled. But highly active sites may
wish to enable the anonymousCache as well, since it can provide users with a more immediate response when they encounter cached pages.
Keep in mind, when the "anonymousCache" is enabled, this means that all non-authenticated users will utilize this cache. This cache can result in
massive speed improvements (for initial page load), as the majority of users may be interacting with cached content. However, these users may
occasionally encounter cached pages which are outdated or "stale" (based on values of "timeToLive" and "allowStale"). This means that these users will
not immediately see new updates or newly added content (Communities, Collections, Items) until the cache has refreshed itself. That said, when
"timeToLive" is set to a low value (like 10 seconds), this risk is minimal for highly active pages/content.
345
config.*.yml
cache:
...
serverSide:
# Set to true to see all cache hits/misses/refreshes in your console logs. Useful for debugging SSR caching
issues.
debug: false
# When enabled (i.e. max > 0), known bots will be sent pages from a server side cache specific for bots.
# (Keep in mind, bot detection cannot be guarranteed. It is possible some bots will bypass this cache.)
botCache:
# Maximum number of pages to cache for known bots. Set to zero (0) to disable server side caching for
bots.
# Default is 1000, which means the 1000 most recently accessed public pages will be cached.
# As all pages are cached in server memory, increasing this value will increase memory needs.
# Individual cached pages are usually small (<100KB), so max=1000 should only require ~100MB of memory.
max: 1000
# Amount of time after which cached pages are considered stale (in ms). After becoming stale, the cached
# copy is automatically refreshed on the next request.
# NOTE: For the bot cache, this setting may impact how quickly search engine bots will index new content
on your site.
# For example, setting this to one week may mean that search engine bots may not find all new content for
one week.
timeToLive: 86400000 # 1 day
# When set to true, after timeToLive expires, the next request will receive the *cached* page & then re-
render the page
# behind the scenes to update the cache. This ensures users primarily interact with the cache, but may
receive stale pages (older than timeToLive).
# When set to false, after timeToLive expires, the next request will wait on SSR to complete & receive a
fresh page (which is then saved to cache).
# This ensures stale pages (older than timeToLive) are never returned from the cache, but some users will
wait on SSR.
allowStale: true
# When enabled (i.e. max > 0), all anonymous users will be sent pages from a server side cache.
# This allows anonymous users to interact more quickly with the site, but also means they may see slightly
# outdated content (based on timeToLive)
anonymousCache:
# Maximum number of pages to cache. Default is zero (0) which means anonymous user cache is disabled.
# As all pages are cached in server memory, increasing this value will increase memory needs.
# Individual cached pages are usually small (<100KB), so a value of max=1000 would only require ~100MB of
memory.
max: 0
# Amount of time after which cached pages are considered stale (in ms). After becoming stale, the cached
# copy is automatically refreshed on the next request.
# NOTE: For the anonymous cache, it is recommended to keep this value low to avoid anonymous users seeing
outdated content.
timeToLive: 10000 # 10 seconds
# When set to true, after timeToLive expires, the next request will receive the *cached* page & then re-
render the page
# behind the scenes to update the cache. This ensures users primarily interact with the cache, but may
receive stale pages (older than timeToLive).
# When set to false, after timeToLive expires, the next request will wait on SSR to complete & receive a
fresh page (which is then saved to cache).
# This ensures stale pages (older than timeToLive) are never returned from the cache, but some users will
wait on SSR.
allowStale: true
Authentication Settings
The "auth" section provides some basic authentication-related settings. Currently, it's primarily settings related to when a session timeout warning will be
showed to your users, etc.
346
Format for 7.2 or later (config.*.yml)
auth:
# Authentication UI settings
ui:
# the amount of time before the idle warning is shown
timeUntilIdle: 900000 # 15 minutes
# the amount of time the user has to react after the idle warning is shown before they are logged out.
idleGracePeriod: 300000 # 5 minutes
# Authentication REST settings
rest:
# If the rest token expires in less than this amount of time, it will be refreshed automatically.
# This is independent from the idle warning. Defaults to automatic refresh when the token will
# expire within 2 minutes. Because token expires after 30 minutes by default, this means automatic
# refresh would occur every ~28 minutes.
timeLeftBeforeTokenRefresh: 120000 # 2 minutes
auth: {
// Authentication UI settings
ui: {
// the amount of time before the idle warning is shown
timeUntilIdle: 15 * 60 * 1000, // 15 minutes
// the amount of time the user has to react after the idle warning is shown before they are logged out.
idleGracePeriod: 5 * 60 * 1000, // 5 minutes
},
// Authentication REST settings
rest: {
// If the rest token expires in less than this amount of time, it will be refreshed automatically.
// This is independent from the idle warning.
timeLeftBeforeTokenRefresh: 2 * 60 * 1000, // 2 minutes
},
},
Form Settings
The "form" section provides basic settings for any forms displayed in the UI. At this time, these settings only include a validatorMap, which is not necessary
to modify for most sites
form:
# (7.5 and above) Whether to enable "spellcheck" attribute of textareas in forms.
spellCheck: true
# NOTE: Map server-side validators to comparative Angular form validators
validatorMap:
required: required
regex: pattern
form: {
// NOTE: Map server-side validators to comparative Angular form validators
validatorMap: {
required: 'required',
regex: 'pattern'
}
},
Notification Settings
347
The "notifications" section provides options related to where user notifications will appear in your UI. By default, they appear in the top right corner, and
timeout after 5 seconds.
notifications:
rtl: false
position:
- top
- right
maxStack: 8
# NOTE: after how many seconds notification is closed automatically. If set to zero notifications are not
closed automatically
timeOut: 5000 # 5 second
clickToClose: true
# NOTE: 'fade' | 'fromTop' | 'fromRight' | 'fromBottom' | 'fromLeft' | 'rotate' | 'scale'
animate: scale
notifications: {
rtl: false,
position: ['top', 'right'],
maxStack: 8,
// NOTE: after how many seconds notification is closed automatically. If set to zero notifications are not
closed automatically
timeOut: 5000, // 5 second
clickToClose: true,
// NOTE: 'fade' | 'fromTop' | 'fromRight' | 'fromBottom' | 'fromLeft' | 'rotate' | 'scale'
animate: NotificationAnimationsType.Scale
},
The set of valid animations can be found in the NotificationAnimationsType, and are implemented in ./src/shared/animations/
Submission Settings
The "submission" section provides some basic Submission/Deposit UI options. These allow you to optionally enable an autosave (disabled by default),
and custom styles/icons for metadata fields or authority confidence values.
348
Format for 7.2 or later (config.*.yml)
submission:
autosave:
# NOTE: which metadata trigger an autosave
metadata: []
# NOTE: after how many time (milliseconds) submission is saved automatically
# eg. timer: 300000 # 5 minutes
timer: 0
icons:
metadata:
# NOTE: example of configuration
# # NOTE: metadata name
# - name: dc.author
# # NOTE: fontawesome (v5.x) icon classes and bootstrap utility classes can be used
# style: fas fa-user
- name: dc.author
style: fas fa-user
# default configuration
- name: default
style: ''
authority:
confidence:
# NOTE: example of configuration
# # NOTE: confidence value
# - name: dc.author
# # NOTE: fontawesome (v5.x) icon classes and bootstrap utility classes can be used
# style: fa-user
- value: 600
style: text-success
- value: 500
style: text-info
- value: 400
style: text-warning
# default configuration
- value: default
style: text-muted
349
Format for 7.1 or 7.0 (environment.*.ts)
submission: {
autosave: {
// NOTE: which metadata trigger an autosave
metadata: [],
/**
* NOTE: after how many time (milliseconds) submission is saved automatically
* eg. timer: 5 * (1000 * 60); // 5 minutes
*/
timer: 0
},
icons: {
metadata: [
/**
* NOTE: example of configuration
* {
* // NOTE: metadata name
* name: 'dc.author',
* // NOTE: fontawesome (v5.x) icon classes and bootstrap utility classes can be used
* style: 'fa-user'
* }
*/
{
name: 'dc.author',
style: 'fas fa-user'
},
// default configuration
{
name: 'default',
style: ''
}
],
authority: {
confidence: [
/**
* NOTE: example of configuration
* {
* // NOTE: confidence value
* value: 'dc.author',
* // NOTE: fontawesome (v4.x) icon classes and bootstrap utility classes can be used
* style: 'fa-user'
* }
*/
{
value: 600,
style: 'text-success'
},
{
value: 500,
style: 'text-info'
},
{
value: 400,
style: 'text-warning'
},
// default configuration
{
value: 'default',
style: 'text-muted'
},
]
}
}
},
350
Language Settings
The "defaultLanguage" and "languages" sections allow you to customize which languages to support in your User Interface. See also Multilingual Support.
# Default Language in which the UI will be rendered if the user's browser language is not an active language
defaultLanguage: en
# Languages. DSpace Angular holds a message catalog for each of the following languages.
# When set to active, users will be able to switch to the use of this language in the user interface.
# All out of the box language packs may be found in the ./src/assets/i18n/ directory
languages:
- code: en
label: English
active: true
- code: cs
label: eština
active: true
- code: de
label: Deutsch
active: true
- ...
// Default Language in which the UI will be rendered if the user's browser language is not an active language
defaultLanguage: 'en',
// Languages. DSpace Angular holds a message catalog for each of the following languages.
// When set to active, users will be able to switch to the use of this language in the user interface.
languages: [{
code: 'en',
label: 'English',
active: true,
}, {
code: 'de',
label: 'Deutsch',
active: true,
},
...
],
The DSpace UI requires that a corresponding language pack file (named with the language code and ending in ".json5") be placed in ./src/assets
/i18n/. See also DSpace 7 Translation - Internationalization (i18n) - Localization (l10n) for information about how to create and contribute these files.
Browse By Settings
In 7.2 or above, the "browseBy" section only provides basic UI configurations for "Browse by" pages (/browse path). The "Browse by" options that appear
in the "All of DSpace" header menu are determined dynamically from the REST API. This allows the UI to change dynamically based on the configured
browse indexes in Discovery.
351
Format for 7.2 or later (config.*.yml)
browseBy:
# Amount of years to display using jumps of one year (current year - oneYearLimit)
oneYearLimit: 10
# Limit for years to display using jumps of five years (current year - fiveYearLimit)
fiveYearLimit: 30
# The absolute lowest year to display in the dropdown (only used when no lowest date can be found for all
items)
defaultLowerLimit: 1900
# If true, thumbnail images for items will be added to BOTH search and browse result lists. (default: true)
showThumbnails: true
# The number of entries in a paginated browse results list.
# Rounded to the nearest size in the list of selectable sizes on the settings menu.
pageSize: 20
# NOTE: The "types" section no longer exists, as it is determined dynamically via the REST API
NOTE: The "pageSize" configuration will always round to the closest "pageSizeOptions" value listed in "page-component-options.model.ts"
In 7.1 or 7.0, the "browseBy" section allowed you to customize which "Browse by" options appear in the "All of DSpace" header menu at the top of your
DSpace site. The "id" MUST correspond to the name of a valid Browse index available from your REST API (see documentation on the REST API /api
/discover/browses endpoint). It is possible to configure additional indexes on the Backend using Discovery, and any configured index appears in your
REST API.
browseBy: {
// Amount of years to display using jumps of one year (current year - oneYearLimit)
oneYearLimit: 10,
// Limit for years to display using jumps of five years (current year - fiveYearLimit)
fiveYearLimit: 30,
// The absolute lowest year to display in the dropdown (only used when no lowest date can be found for all
items)
defaultLowerLimit: 1900,
// List of all the active Browse-By types
// Adding a type will activate their Browse-By page and add them to the global navigation menu,
// as well as community and collection pages
// Allowed fields and their purpose:
// id: The browse id to use for fetching info from the rest api
// type: The type of Browse-By page to display
// metadataField: The metadata-field used to create starts-with options (only necessary when the type is
set to 'date')
types: [
{
id: 'title',
type: BrowseByType.Title,
},
{
id: 'dateissued',
type: BrowseByType.Date,
metadataField: 'dc.date.issued'
},
{
id: 'author',
type: BrowseByType.Metadata
},
{
id: 'subject',
type: BrowseByType.Metadata
}
]
},
Community-List Settings
Available in 7.4 or later
352
The "communityList" section allows you to configure the behavior of the "Communities & Collections" page (/community-list path), which is linked in the
header.
config.*.yml
communityList:
# Number of communities to list per expansion (i.e. each time you click "show more")
pageSize: 20
NOTE: The "pageSize" configuration will always round to the closest "pageSizeOptions" value listed in "page-component-options.model.ts"
Homepage Settings
Available in 7.4 or later
The "homePage" section allows you to configure the behavior of the DSpace homepage (/ path).
config.*.yml
homePage:
recentSubmissions:
# The number of item showing in recent submissions list. Set to "0" to hide all recent submissions
pageSize: 5
# Date field to use to sort recent submissions
sortField: 'dc.date.accessioned'
topLevelCommunityList:
# Number of communities to list (per page) on the home page
# This will always round to the nearest number from the list of page sizes. e.g. if you set it to 7 it'll
use 10
pageSize: 5
NOTE: The "pageSize" configuration will always round to the closest "pageSizeOptions" value listed in "page-component-options.model.ts"
Undo Settings
Both the "item" edit and "collection" edit screens allow you to undo changes within a specific time. This is controlled by these settings:
item:
edit:
undoTimeout: 10000 # 10 seconds
collection:
edit:
undoTimeout: 10000 # 10 seconds
item: {
edit: {
undoTimeout: 10000 // 10 seconds
}
},
collection: {
edit: {
undoTimeout: 10000 // 10 seconds
}
},
353
Item access labels allow to display for each item in search results if it is Open Access, under embargo, restricted or metadata only (does not contain any
file/bitstream). This feature is disabled by default, but can be enabled in your config.*.yml.
config.*.yml
# Item Config
item:
# Show the item access status label in items lists (default=false)
showAccessStatuses: true
The "item" section allows you to configure the behavior of the Item pages.
config.*.yml
item:
...
bitstream:
# Number of entries in the bitstream list in the item view page.
pageSize: 5
NOTE: The "pageSize" configuration will always round to the closest "pageSizeOptions" value listed in "page-component-options.model.ts"
Theme Settings
The "themes" section allows you to configure which theme(s) are enabled for your DSpace site (with the default theme being the "dspace" one). You can
enable a single theme across all pages, and/or enable specific alternative themes based on a specific Community, Collection or Item (by UUID or Handle),
or based on a Regex match of a URL pattern. This allows you fine grained control over how your site looks, including the ability to customize it per
Community or Collection or even per specific pages in the site. See User Interface Customization for details of how to create a new, custom theme.
354
Format for 7.2 or later (config.*.yml)
themes:
# Add additional themes here. In the case where multiple themes match a route, the first one
# in this list will get priority. It is advisable to always have a theme that matches
# every route as the last one
#
# # A theme with a handle property will match the community, collection or item with the given
# # handle, and all collections and/or items within it
# - name: 'custom',
# handle: '10673/1233'
#
# # A theme with a regex property will match the route using a regular expression. If it
# # matches the route for a community or collection it will also apply to all collections
# # and/or items within it
# - name: 'custom',
# regex: 'collections\/e8043bc2.*'
#
# # A theme with a uuid property will match the community, collection or item with the given
# # ID, and all collections and/or items within it
# - name: 'custom',
# uuid: '0958c910-2037-42a9-81c7-dca80e3892b4'
#
# # The extends property specifies an ancestor theme (by name). Whenever a themed component is not found
# # in the current theme, its ancestor theme(s) will be checked recursively before falling back to default.
# - name: 'custom-A',
# extends: 'custom-B',
# # Any of the matching properties above can be used
# handle: '10673/34'
#
# - name: 'custom-B',
# extends: 'custom',
# handle: '10673/12'
#
# # A theme with only a name will match every route
# name: 'custom'
#
# # This theme will use the default bootstrap styling for DSpace components
# - name: BASE_THEME_NAME
#
- name: dspace
# Whenever this theme is active, the following tags will be injected into the <head> of the page.
# Example use case: set the favicon based on the active theme.
headTags:
- tagName: link
attributes:
rel: icon
href: assets/dspace/images/favicons/favicon.ico
sizes: any
- tagName: link
attributes:
rel: icon
href: assets/dspace/images/favicons/favicon.svg
type: image/svg+xml
- tagName: link
attributes:
rel: apple-touch-icon
href: assets/dspace/images/favicons/apple-touch-icon.png
- tagName: link
attributes:
rel: manifest
href: assets/dspace/images/favicons/manifest.webmanifest
355
Format for 7.1 or 7.0 (environment.*.ts)
themes: [
// Add additional themes here. In the case where multiple themes match a route, the first one
// in this list will get priority. It is advisable to always have a theme that matches
// every route as the last one
// {
// // A theme with a handle property will match the community, collection or item with the given
// // handle, and all collections and/or items within it
// name: 'custom',
// handle: '10673/1233'
// },
// {
// // A theme with a regex property will match the route using a regular expression. If it
// // matches the route for a community or collection it will also apply to all collections
// // and/or items within it
// name: 'custom',
// regex: 'collections\/e8043bc2.*'
// },
// {
// // A theme with a uuid property will match the community, collection or item with the given
// // ID, and all collections and/or items within it
// name: 'custom',
// uuid: '0958c910-2037-42a9-81c7-dca80e3892b4'
// },
// {
// // Using the "extends" property allows a theme to extend/borrow from an ancestor theme (by name).
// // Wherever a theme component is now found in this themes, its ancestor theme(s) will be checked
// // recursively before falling back to default.
// name: 'custom-A',
// extends: 'custom-B',
// // Any of the matching properties above can be used
// handle: 10673/34,
// },
// {
// name: 'custom-B',
// extends: 'custom',
// handle: 10673/12,
// },
// {
// // A theme with only a name will match every route
// name: 'custom'
// },
// {
// // This theme will use the default bootstrap styling for DSpace components
// name: BASE_THEME_NAME
// },
{
// The default dspace theme
name: 'dspace'
// Whenever this theme is active, the following tags will be injected into the <head> of the page.
// Example use case: set the favicon based on the active theme.
headTags: [
{
// Insert <link rel="icon" href="assets/dspace/images/favicons/favicon.ico" sizes="any"/> into the
<head> of the page.
tagName: 'link',
attributes: {
'rel': 'icon',
'href': 'assets/dspace/images/favicons/favicon.ico',
'sizes': 'any',
}
},
...
]
},
],
356
Media Viewer Settings
The DSpace UI comes with a basic, out-of-the-box Media Viewer (disabled by default). This media viewer can support any files which have a MIME Type
that begins with either "image/*", "video/*", or "audio/*".
# Whether to enable media viewer for image and/or video Bitstreams (i.e. Bitstreams whose MIME type starts with
'image' or 'video').
# When "image: true", this enables a gallery viewer where you can zoom or page through images.
# When "video: true", this enables embedded video streaming. This embedded video streamer also supports audio
files.
mediaViewer:
image: false
video: false
The Item must already have a Bitstream which is a video file (in a "video/*" format) in the ORIGINAL bundle. In this example, we'll assume it is
named "myVideo.mp4"
Upload a corresponding WebVTT Caption file named "[video-filename]-[languageCode].vtt" to the ORIGINAL bundle.
For a video named "myVideo.mp4", an English caption file would be named "myVideo.mp4-en.vtt".
If an additional Spanish language caption file was uploaded, it should be named "myVideo.mp4-es.vtt".
All WebVTT Caption files MUST use two-letter ISO 639-1 Language Codes. A list of all supported Language Codes can be found in "src
/app/item-page/media-viewer/media-viewer-video/language-helper.ts"
Once the Caption file is uploaded, reload the video viewer (on the Item page). You should now see the "Captions" (or CC) option is now available.
(Depending on the browser you use, this option may appear in the lower menu of the video, or require you to open an options menu.) Selecting it
will enable captioning in your language of choice.
// Whether to enable media viewer for image and/or video Bitstreams (i.e. Bitstreams whose MIME type starts
with "image" or "video").
// For images, this enables a gallery viewer where you can zoom or page through images.
// For videos, this enables embedded video streaming
mediaViewer: {
image: false,
video: false,
},
The DSpace UI comes with basic end-user agreement and privacy policy functionality. Since release 7.4 these features can be disabled in a configuration
file. More information on what disabling on of these features results in is documented in the default app configuration (see code snippet below).
config.*.yml
info:
# Whether the end user agreement is required before users may use the repository.
# If enabled, the user will be required to accept the agreement before they can use the repository.
# If disabled, the page will not exist and no agreement is required to use the repository
enableEndUserAgreement: false
# Whether the privacy statement should exist or not.
enablePrivacyStatement: false
357
The DSpace UI can support Markdown (using https://commonmark.org/) and MathJax (https://www.mathjax.org) in metadata field values. Both Markdown
and MathJax are disabled by default.
HTML is a part of markdown, so enabling the markdown option will ensure HTML tags in metadata field values get rendered as well
Mathjax will only be rendered if markdown is enabled, so configuring 'markdown.mathjax = true' with 'markdown.enabled = false' will have no effect.
By default, only the "dc.description.abstract" metadata supports these formats when enabled. To enable markdown for other metadata fields, a custom sub-
component of the ItemPageFieldComponent has to be created for that metadata field, with the enableMarkdown field set to true. Refer to the ItemPageAbst
ractFieldComponent component for an example.
When using hierarchical controlled vocabularies (e.g. SRSC as described in Authority Control of Metadata Values), it's possible to search using the
controlled vocabulary hierarchy via the search filters. To enable this feature, you must specify the filter and vocabulary to enable as follows:
Keep in mind, the "filter" MUST be a valid search filter (e.g. subject, author) as seen on the "/api/discover/facets" REST API endpoint. The "vocabulary"
MUST be a valid controlled vocabulary installed in your DSpace backend (under "[dspace]/config/controlled-vocab/" folder based on the documentation at A
uthority Control of Metadata Values.
When this feature is enabled, you should see a "Browse [filter] tree" link in the search filter on the search results page (and anywhere search filters are
shown). This "Browse [filter] tree" link will allow you to select a search filter from within the configured hierarchical vocabulary.
As of DSpace 7.2, these settings are no longer editable. Universal is automatically enabled at all times to support Search Engine Optimization.
The "universal" section pertains to enabling/disabling Angular Universal for Server-side rendering. DSpace requires server-side rendering to support Searc
h Engine Optimization. When it's turned off, your site may not be able to be indexed in Google, Google Scholar and other search engines.
environment.*.ts
Debug Settings
The "debug" property allows you to turn on debugging in the Angular UI. When enabled, your environment and all Redux actions/transfers are logged to
the console. This is only ever needed if you are debugging a tricky issue.
358
Format for 7.2 or later (config.*.yml)
359
User Interface Customization
Angular Overview
Theme Technologies
Running the UI in Developer Mode
Creating a Custom Theme
Theme Directories & Design Principles
Getting Started
Global style/font/color customizations
Customize Logo in Header
Customize Navigation Links in Header
Customize Footer
Customize Favicon for site or theme
Customize Home Page News
Customize the simple Item page
Customize other Components in your Theme
Customize UI labels using Internationalization (i18n) files
Extending other Themes
Adding Component Directories to your Theme
Removing Component Directories from your Theme
Debugging which theme is being used
Finding which component is generating the content on a page
Additional Theming Resources
Angular Overview
The DSpace User Interface (UI) is built on the Angular.io framework. All data comes from the REST API (DSpace Backend), and the final HTML pages
are generated via TypeScript.
Before getting started in customizing or branding the UI, there are some basic Angular concepts to be aware of. You do not need to know Angular or
TypeScript to theme or brand the UI. But, understanding a few basic concepts will allow you to better understand the folder structure / layout of the
codebase.
Angular Components: In Angular, every webpage consists of a number of "components" which define the structure of a page. They are the main
"building block" of any Angular application. Components are reusable across many pages. So, for example, there's only one "header" and "footer"
component, even though they appear across all pages.
A *.component.ts (TypeScript) file which contains the logic & name ("selector") of the component
A *.component.html (HTML) file which contains the HTML markup for the component (and possibly references to other embedded
components). This is also called the "template".
In HTML files, components are named/referenced as HTML-like tags (e.g. <ds-header>, <ds-footer>). In DSpace's UI, every
component starts with "ds-" in order to distinguish it as an out-of-the-box DSpace component.
A *.component.scss (Sass / CSS) file which contains the style for the component.
If you want a deeper dive into Angular concepts of Components and Templates, see https://angular.io/guide/architecture-components
Theme Technologies
The DSpace UI uses:
Bootstrap (v4.x) website framework for general layout & webpage components (buttons, alerts, etc)
Sass, a CSS preprocessor, for stylesheets. Sass is very similar to CSS (an in fact, any CSS is valid Sass). But, Sass allows you to nest CSS
rules & have variables and functions. For a brief overview on Sass, see https://sass-lang.com/guide
HTML5, the latest specification of the HTML language
Familiarity with these technologies (or even just CSS + HTML) is all you need to do basic theming of the DSpace UI.
yarn start:dev
360
UI will automatically reload anytime you modify a file. Essentially the UI will constantly "watch" for changes (as you make them) & will reload
anytime you modify a file. This lets you find issues/bugs more rapidly and also test more rapidly.
Keep in mind, you should NEVER run the UI in developer mode in production scenarios. Production mode provides much better performance and ensures
your site fully supports SEO, etc.
app/ contains the theme's Angular components and should mirror the structure of src/app/
assets/ contains the theme's custom assets, such as fonts or images
styles/ contains the theme's global styles
eager-theme.module.ts declares the components that should be included in the app's main bundle, such as
Eager components are those that should be available immediately when first loading, such as the main parts of the homepage and
components that are present on every page.
Entry components that are registered via a decorator such as @listableObjectComponent. These must also be included in the
module's providers.
lazy-theme.module.ts declares all the other components of the theme.
Out of the box, there are three theming layers/directories to be aware of:
Base Theme (src/app/ directories): The primary look & feel of DSpace (e.g. HTML layout, header/footer, etc) is defined by the HTML5
templates under this directory. Each HTML5 template is stored in a subdirectory named for the Angular component where that template is used.
The base theme includes very limited styling (CSS, etc), based heavily on default Bootstrap (4.x) styling, and only allowing for minor tweaks to
improve WCAG 2.1 AA accessibility.
Custom Theme (src/themes/custom directories): This directory acts as the scaffolding or template for creating a new custom theme. It
provides (empty) Angular components/templates which allow you to change the theme of individual components. Since all files are empty by
default, if you enable this theme (without modifying it), it will look identical to the Base Theme.
DSpace Theme (src/themes/dspace directories): This is the default theme for DSpace 7. It's a very simple example theme providing a
custom color scheme, header & homepage on top of the Base Theme. It's important to note that this theme ONLY provides custom CSS/images
to override our Base Theme. All HTML5 templates are included at the Base Theme level, as this ensures those HTML5 templates are also
available to the Custom Theme.
The DSpace UI design principles & technologies are described in more detail at DSpace UI Design principles and guidelines
Getting Started
1. Choose a theme to start from: As documented above, there are two "src/theme/" directories provided out of the box: "custom" or "dspace". You
should select one to use as the basis for your theme. Which you choose is up to you, but here are a few things to consider:
a. DSpace Theme (src/themes/dspace): This is a simple, example theme for novice users. Primarily, in this theme, you can
immediately customize the CSS, header & homepage components. You can add other components as needed (see "Adding Component
Directories to your Theme" below).
i. Advantages: This theme is small and simple. It provides an easy starting point / example for basic themes. Future User
Interface upgrades (e.g. from 7.1 7.2) are likely to be easier because the theme is smaller in size.
ii. Disadvantages: It has very few component directories by default. But you can always add more. See "Adding Component
Directories to your Theme" below.
b. Custom Theme (src/themes/custom): This theme provides all available theme-able components for more advanced or complex
theming options. This provides you full control over everything that is theme-able in the User Interface
i. Advantages: All theme-able components are provided in subdirectories. This makes it easier to modify the look and feel of any
area of the User Interface.
ii. Disadvantages: After creating your theme, you may wish to remove any component directories that you didn't modify (see "Rem
oving Component Directories from your Theme" below). Generally speaking, upgrades (e.g. from 7.1 7.2) are often easier if
your theme includes fewer components (as your theme may require updates if any component it references change
significantly).
2. Create your own theme folder OR edit the existing theme folder: Either edit the theme directory in place, or copy it (and all its contents) into a new
folder under src/themes/ (choose whatever folder name you want)
3. Register your theme folder (only necessary if you create a new folder in previous step): Now, we need to make the UI aware of this new theme
folder, before it can be used in configuration.
a. Modify angular.json (in the root folder), adding your theme folder's main "theme.scss" file to the "styles" list. The below example is
for a new theme folder named src/themes/mydspacesite/
361
"styles": [
"src/styles/startup.scss",
{
"input": "src/styles/base-theme.scss",
"inject": false,
"bundleName": "base-theme"
},
...
{
"input": "src/themes/mydspacesite/styles/theme.scss",
"inject": false,
"bundleName": "mydspacesite-theme"
},
]
NOTE: the "bundleName" for your custom them MUST use the format "${folder-name}-theme". E.g. if the folder is named "src/themes
/amazingtheme", then the "bundleName" MUST be "amazingtheme-theme"
4. (As of 7.3 or above) Import the new theme's eager-theme.module.ts in themes/eager-themes.module.ts. If you're switching from one
theme to another, remove the old theme from the imports. Below is an example for a theme named "my-theme":
themes/eager-themes.module.ts
// COMMENT out the imports for any themes you are NOT using
//import { EagerThemeModule as DSpaceEagerThemeModule } from './dspace/eager-theme.module';
//import { EagerThemeModule as CustomEagerThemeModule } from './custom/eager-theme.module';
// Add a new import for your custom theme. Give its EagerThemeModule a unique name (e.g. "as [choose-a-
unique-name]").
// Make sure the path points at its "eager-theme.module.ts" (see 'from' portion of the import statement).
// NOTE: You can import multiple themes if you plan to use multiple themes
import { EagerThemeModule as MyThemeEagerThemeModule } from './my-theme/eager-theme.module';
...
@NgModule({
imports: [
// Again, comment out any themes you are NOT using
//DSpaceEagerThemeModule,
//CustomEagerThemeModule,
5. Enable your theme: Modify your config/config.*.yml configuration file (in 7.1 or 7.0 this file was named src/environments
/environment.*.ts), adding your new theme to the "themes" array in that file. Pay close attention to modify the correct configuration file (e.g.
modify config.dev.yml if running in dev mode, or config.prod.yml if running in prod mode). We recommend starting in "dev mode" (config.dev.yml)
as this mode lets you see your changes immediately in a browser without a full rebuild of the UI – see next step.
# In this example, we only show one theme enabled. It's possible to enable multiple (see below note)
themes:
- name: 'mydspacesite'
362
Format for 7.1 or 7.0 (environment.*.ts)
// In this example, we only show one theme enabled. It's possible to enable multiple (see below note)
themes: [
{
name: 'mydspacesite'
},
]
NOTE: The "name" used is the name of the theme's folder, so the example is for enabling a theme at src/themes/mydspacesite/
globally. You should also comment out the default "dspace" theme, if you intend to replace it entirely.
NOTE #2: You may also choose to enable multiple themes for your site, and even specify a different theme for different Communities,
Collections, Items or URL paths. See User Interface Configuration for more details on "Theme Settings"
6. Verify your settings by starting the UI (ideally in Dev mode): At this point, you should verify the basic settings you've made all "work". We
recommend doing your theme work while running the UI in "dev mode", as the UI will auto-restart anytime you save a new change. This will allow
you to quickly see the impact of each change in your browser.
7. At this point, you can start making changes to your theme. See the following sections for examples of how to make common changes.
1. Global style changes: All global style changes can be made in your theme's styles folder (e.g. src/themes/mydspacesite/styles). There
are four main files in that folder:
a. _theme_sass_variable_overrides.scss - May be used to override Bootstrap's default Sass variables. This is the file you may
wish to use for most style changes. There are a large number of Bootstrap variables available which control everything from fonts,
colors and the base style for all Bootstrap web components. For a full list of Bootstrap variables you can override in this file, see the nod
e_modules/bootstrap/scss/_variables.scss file (which is installed in your source directory when you run "yarn install"). More
information may also be found in the Bootstrap Sass documentation at https://getbootstrap.com/docs/4.0/getting-started/theming/#sass
b. _theme_css_variable_overrides.scss - May be used to override DSpace's default CSS variables. DSpace's UI uses CSS
variables for all its components. These variables all start with "--ds-*", and are listed in src/styles/_custom_variables.scss. Yo
u can also use this file to add your own, custom CSS variables which you want to use for your theme. If you create custom variables,
avoid naming them with a "--ds-*" or a "--bs-*" prefix, as those are reserved for DSpace and Bootstrap variables.
c. _global-styles.scss - May be used to modify the global CSS/SCSS for the site. This file may be used to override the default global
style contained in src/styles/_global-styles.scss . Keep in mind, many styles can be more quickly changed by simply
updating a variable in one of the above "*_variable_overrides.scss" files. So, it's often easier to use those first, where possible.
d. theme.scss - This just imports all the necessary Sass files to create the theme. It's unnecessary to modify directly, unless you with to
add new Sass files to your theme.
2. Modifying the default font: By default, DSpace uses Bootstrap's "native font stack", which just uses system UI fonts. However, the font used in
your site can be quickly updated via Bootstrap variables in your theme's _theme_sass_variable_overrides.scss file.
a. One option is to add a new import statement and modify the "$font-family-sans-serif" variable:
// Configure Bootstrap to use this font (and list a number of backup fonts to use on various
systems)
$font-family-sans-serif: 'Source Sans Pro', -apple-system, BlinkMacSystemFont, "Segoe UI",
"Roboto", "Helvetica Neue", Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI
Symbol" !default;
b. If your font requires installing local files, you can do the following
i. Copy your font file(s) in your theme's assets/fonts/ folder
ii. Create a new ".scss" file specific to your font in that folder, e.g. assets/fonts/open-sans.scss, and use the "@font-face"
CSS rule to load that font:
363
open-sans.scss
@font-face {
font-family: "Open Sans";
src: url("/assets/fonts/OpenSans-Regular-webfont.woff2") format("woff2"),
url("/assets/fonts/OpenSans-Regular-webfont.woff") format("woff");
}
iii. Then, import that new "open-sans.scss" file and use it in the "$font-family-sans-serif" variable
// Configure Bootstrap to use this font (and list a number of backup fonts to use on
various systems)
$font-family-sans-serif: 'Open Sans', -apple-system, BlinkMacSystemFont, "Segoe UI",
"Roboto", "Helvetica Neue", Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji",
"Segoe UI Symbol" !default;
c. Keep in mind, as changing the font just requires adjusting Bootstrap Sass variables, there are a lot of Bootstrap guides out there that can
help you make more advanced changes
3. Modifying default color scheme: The colors used in your site can be quickly updated via Bootstrap variables in your theme's _theme_sass_vari
able_overrides.scss file.
a. Again, you can use entirely Bootstrap variables to adjust color schemes. See the Bootstrap Theming Colors documentation
b. A list of all Bootstrap color variables can be found in the node_modules/bootstrap/scss/_variables.scss file
c. Additional examples can be found in the out-of-the-box "dspace" theme, which adjusts the default Bootstrap colors slightly for both
accessibility & to match the DSpace logo.
4. Any changes require rebuilding your UI. If you are running in "dev mode" (yarn start:dev), then the UI will restart automatically whenever changes
are detected.
Support for theme switching at runtime requires that components use CSS custom properties (which vary at runtime) rather than SASS variables (which
are fixed at build time). Thus, SASS variables will be undefined in individual components' stylesheets. The Bootstrap SASS variables are mapped to CSS
properties for use in these places. For example, $red is mapped to --bs-red and may be referenced as var("--bs-red").
header.component.ts
@Component({
selector: 'ds-header',
// styleUrls: ['header.component.scss'],
styleUrls: ['../../../../app/header/header.component.scss'],
// Uncomment the templateUrl which references the "header.component.html" file in your theme directory
templateUrl: 'header.component.html',
// Comment out the templateUrl which references the default "src/app/header/header.component.html"
file.
//templateUrl: '../../../../app/header/header.component.html',
})
3. Your theme's version of the header.component.html file will be empty by default. Copy over the default HTML code from src/app/header
/header.component.html into your version of this file.
4. Then, modify your copy of header.component.html to use your logo. In this example, we're assuming your theme name is "mytheme" and the logo
file is named "my-logo.svg"
364
<header>
<div class="container">
<div class="d-flex flex-row justify-content-between">
<a class="navbar-brand my-2" routerLink="/home">
<!-- Modify the logo on the next line -->
<img src="/assets/mytheme/images/my-logo.svg" [attr.alt]="'menu.header.image.logo' | translate"/>
</a>
...
</header>
5. Obviously, you can also make additional modifications to the HTML of the header in this file! You'll also see that the header references several
other DSpace UI components (e.g. <ds-search-navbar> is the search icon in the header). You can easily comment out these tags to disable
them, or move them around to change where that component appears in the header.
6. Any changes require rebuilding your UI. If you are running in "dev mode" (yarn start:dev), then the UI will restart automatically whenever changes
are detected.
7. NOTE: If you have a theme based on the "dspace" theme, be aware that theme places the header logo in two locations. This allows the "dspace"
theme to support a single-line header (whereas the "custom" theme's header is multi-line):
a. The Header component (as described above) is only used on user profile pages
b. The Navbar component (src/app/navbar/navbar.component.html) is used everywhere else. The Navbar component can be
customized in the same way as the Header Component. Just edit the logo path in the "navbar.component.html".
1. Edit your theme's existing app/navbar/navbar.component.html file. This file defines the entire <nav> which displays the navigation header
across the entire DSpace site. While much of the content in this <nav> is loaded dynamically via other Angular components, it is possible to
easily add a hardcoded link to the existing header. Find the section of this <nav> which is the <div id="collapsingNav">, as that's where you'll
want to add your new link. See inline comments in the example below.
navbar.component.html
<nav>
...
<!-- This DIV is where the header links are added dynamically.
You should see it surrounding all links in the header if you view HTML source -->
<div id="collapsingNav" ... >
<!-- The links themselves are in an unordered list (UL) -->
<ul class="navbar-nav" ... >
...
<!-- Add your new link at the end (or beginning) of this UL in a new LI tag -->
<!-- NOTE: All classes used below are the same Bootstrap CSS classes used by our 'dspace' and
'custom' themes.
You can modify them if the link doesn't look correct in your theme. -->
<li class="nav-item d-flex align-items-center">
<div class="text-md-center">
<a href="http://dspace.org" class="nav-link">DSpace.org</a>
</div>
</li>
</ul>
</div>
</nav>
2. Obviously, you can also make additional modifications to the HTML of the header in this file, as necessary for your navigation header. Keep in
mind though that anything you remove may impact the dynamic content that is pulled into this navigation header.
a. An example is that the header logo for the "dspace" theme also exists in this same file.
3. Any changes require rebuilding your UI. If you are running in "dev mode" (yarn start:dev), then the UI will restart automatically whenever changes
are detected.
Customize Footer
1. First, you'll want to decide if you want to modify just the footer's HTML, or the footer's styles (CSS/Sass), or both.
a. If you want to modify the HTML, you'll need to create a copy of "footer.component.html" in your theme, where you place your changes.
b. If you want to modify the styles, you'll need to create a copy of "footer.component.scss" in your theme, where you place your changes.
2. Edit your theme's app/footer/footer.component.ts file. Swap the "templateUrl" and "styleUrls" properties, based on which you want to
modify in your theme.
365
2.
footer.component.ts
@Component({
selector: 'ds-footer',
// If you want to modify styles, then...
// Uncomment the styleUrls which references the "footer.component.scss" file in your theme's directory
// and comment out the one that references the default "src/app/footer/footer.component.scss"
styleUrls: ['footer.component.scss'],
//styleUrls: ['../../../../app/footer/footer.component.scss'],
// If you want to modify HTML, then...
// Uncomment the templateUrl which references the "footer.component.html" file in your theme's
directory
// and comment out the one that references the default "src/app/footer/footer.component.html"
templateUrl: 'footer.component.html'
//templateUrl: '../../../../app/footer/footer.component.html'
})
3. Now, based on what you want to modify, you will need to either update your theme's copy of footer.component.html or footer.
component.scss or both.
a. To change footer HTML: Your theme's version of the footer.component.html file will be empty by default. Copy over the default
HTML code from src/app/footer/footer.component.html into your version of this file.
b. To change footer Styles: Your theme's version of the footer.component.scss file will be empty by default. Copy over the default
Sass code from src/app/footer/footer.component.scss into your version of this file.
4. Modify the HTML or Sass as you see fit.
a. If you want to add images, add them to your theme's assets/images folder. Then reference them at the /assets/[theme-name]
/images/ URL path.
b. Keep in mind, all Bootstrap variables, utility classes & styles can be used in these files. Take advantage of Bootstrap when you can do
so.
5. DSpace also has a option to display a two-level footer, which is becoming more common these days. By default. DSpace just displays a small,
bottom footer. But, you can enable a top footer (above that default footer) by add this line into your theme's footer.component.ts
footer.component.ts
This top footer appears in the footer.component.html within a div. Notice the "*ngIf='showTopFooter'", which only shows that div
when that variable is set to "true"
footer.component.html
<footer class="text-lg-start">
<!-- This div and everything within it only displays if showTopFooter=true -->
<div *ngIf="showTopFooter" class="top-footer">
...
</div>
<!-- The bottom footer always displays -->
<div class="bottom-footer ...">
...
</div>
</footer>
6. Any changes require rebuilding your UI. If you are running in "dev mode" (yarn start:dev), then the UI will restart automatically whenever changes
are detected.
366
Each theme has the ability to add a set of (attribute-only) HTML tags in the <head> section of the page. This is useful for example to change the favicon b
ased on the active theme. Whenever the theme changes, the head tags are reset. A theme can inherit head tags from the parent theme only if it doesn't
have any head tags itself. (E.g. theme B extends theme A; if theme B does not have head tags, the head tags of theme A will be used (if any). However, if
theme B does have head tags, only the tags from theme B will be used.) If none of the themes in the inheritance hierarchy have head tags configured, the
head tags of the default theme (if any) will be used.
Note that a simple hardcoded favicon is set in case no head tags are currently active. The hardcoded favicon is stored at src/assets/images
/favicon.ico. This implies that if head tags are added to a theme, the favicon should also be configured explicitly for that theme, else the behavior is
undefined.
1. In the "themes" section of your config/config.*.yml configuration file, add (one or more) "headTags", pointing at the favicon file you want to
use. For example:
themes:
# The default dspace theme
- name: dspace
# Whenever this theme is active, the following tags will be injected into the <head> of the page.
# Example use case: set the favicon based on the active theme.
headTags:
# Insert <link rel="icon" href="assets/dspace/images/favicons/favicon.ico" sizes="any"/> into the
<head> of the page.
- tagName: link
attributes:
rel: icon
href: assets/dspace/images/favicons/favicon.ico
sizes: any
# Insert <link rel="icon" href="assets/dspace/images/favicons/favicon.svg" type="image/svg+xml"/>
into the <head> of the page.
- tagName: link
attributes:
rel: icon
href: assets/dspace/images/favicons/favicon.svg
type: image/svg+xml
# Insert <link rel="apple-touch-icon" href="assets/dspace/images/favicons/apple-touch-icon.png"/>
into the <head> of the page.
- tagName: link
attributes:
rel: apple-touch-icon
href: assets/dspace/images/favicons/apple-touch-icon.png
# Insert <link rel="manifest" href="assets/dspace/images/favicons/manifest.webmanifest"/> into the
<head> of the page.
- tagName: link
attributes:
rel: manifest
href: assets/dspace/images/favicons/manifest.webmanifest
2. In 7.2 or above, any changes to this configuration just require restarting your site (no rebuild necessary). In 7.1 or 7.0, you must rebuild your site
after modifying the favicon.ico.
3. NOTE: If you specify multiple formats for your favicon (e.g. favicon.svg and favicon.ico), then your browser will select which one it prefers (e.g.
Chrome seems to favor SVG over ICO). However, if you want to force all browser to use a single favicon, then you may wish to only specify one
"icon" format in your headTags section.
1. First, you'll want to decide if you want to modify just the Home Page News HTML, or styles (CSS/Sass), or both.
a. If you want to modify the HTML, you'll need to create a copy of the HTML in "app/home-page/home-news/home-news.component.html"
in your theme. This is where you place your changes.
b. If you want to modify the styles, you'll need to create a copy of the CSS in "app/home-page/home-news/home-news.component.scss" in
your theme. This is where you place your changes.
2. Edit your theme's app/home-page/home-news/home-news.component.ts file. Swap the "templateUrl" and "styleUrls" properties, based on
which you want to modify in your theme.
367
home-news.component.ts
@Component({
selector: 'ds-home-news',
// If you want to modify styles, then...
// Uncomment the styleUrls which references the "home-news.component.scss" file in your theme's
directory
// and comment out the one that references the default "src/app/home-page/home-news/home-news.
component.scss"
styleUrls: ['./home-news.component.scss'],
//styleUrls: ['../../../../../app/home-page/home-news/home-news.component.scss'],
// If you want to modify HTML, then...
// Uncomment the templateUrl which references the "home-news.component.html" file in your theme's
directory
// and comment out the one that references the default "src/app/home-page/home-news/home-news.
component.html"
templateUrl: './home-news.component.html'
//templateUrl: '../../../../../app/home-page/home-news/home-news.component.html'
})
3. Now, based on what you want to modify, you will need to either update your theme's copy of home-news.component.html or home-news.
component.scss or both.
a. To change HTML: Your theme's version of the home-news.component.html file will be empty by default. Copy over the default HTML
code from src/app/home-page/home-news/home-news.component.html into your version of this file.
b. To change Styles: Your theme's version of the home-news.component.scss file will be empty by default. Copy over the default Sass
code from src/app/home-page/home-news/home-news.component.scss into your version of this file.
4. Modify the HTML or Sass as you see fit.
a. If you want to add images, add them to your theme's assets/images folder. Then reference them at the /assets/[theme-name]
/images/ URL path.
b. Keep in mind, all Bootstrap variables, utility classes & styles can be used in these files. Take advantage of Bootstrap when you can do
so.
5. Any changes require rebuilding your UI. If you are running in "dev mode" (yarn start:dev), then the UI will restart automatically whenever changes
are detected.
Normal Item: The code for the simple Item page for a normal Item (i.e. not an Entity) can be found in the source code at "src/app/item-page
/simple/item-types/untyped-item/"
Publication Entity: If you are wanting to modify the display of Publication Entities, it has separate source code under "src/app/item-page
/simple/item-types/publication/"
Here's the basics of modifying this page. The below examples assume you are working with a normal Item. But the same logic would work for modifying
the Publication pages (you'd just need to modify it's HTML/CSS instead)
1. First, you'll want to decide if you want to modify just the Item Page HTML, or styles (CSS/Sass), or both.
a. If you want to modify the HTML, you'll need to create a copy of the HTML in "src/app/item-page/simple/item-types/untyped-
item/untyped-item.component.html" in your theme. This is where you place your changes.
b. If you want to modify the styles, you'll need to create a copy of the CSS in "src/app/item-page/simple/item-types/untyped-
item/untyped-item.component.scss" in your theme. This is where you place your changes.
2. Edit your theme's app/item-page/simple/item-types/untyped-item/untyped-item.component.ts file. Swap the "templateUrl" and
"styleUrls" properties, based on which you want to modify in your theme. Also, MAKE SURE the "@listableObjectComponent" is using your
theme... the last parameter should be the name of your theme!
368
untyped-item.component.ts
// MAKE SURE that the final parameter here is the name of your theme. This one assumes your theme is
named "custom".
@listableObjectComponent(Item, ViewMode.StandalonePage, Context.Any, 'custom')
@Component({
selector: 'ds-untyped-item',
// If you want to modify styles, then...
// Uncomment the styleUrls which references the "untyped-item.component.scss" file in your theme's
directory
// and comment out the one that references the default in "src/app/"
styleUrls: ['./untyped-item.component.scss'],
//styleUrls: ['../../../../../../../app/item-page/simple/item-types/untyped-item/untyped-item.
component.scss'],
// If you want to modify HTML, then...
// Uncomment the templateUrl which references the "untyped-item.component.html" file in your theme's
directory
// and comment out the one that references the default "src/app/"
templateUrl: './untyped-item.component.html',
//templateUrl: '../../../../../../../app/item-page/simple/item-types/untyped-item/untyped-item.
component.html',
})
3. Now, based on what you want to modify, you will need to either update your theme's copy of untyped-item.component.html or untyped-
item.component.scss or both.
a. To change HTML: Your theme's version of the untyped-item.component.html file may be empty by default. Copy over the default
HTML code from src/item-page/simple/item-types/untyped-item/untyped-item.component.html into your version of
this file.
b. To change Styles: Your theme's version of the untyped-item.component.scss file may be empty by default. Copy over the default
Sass code from src/item-page/simple/item-types/untyped-item/untyped-item.component.scss into your version of
this file
4. In the HTML of the Simple Item page, you'll see a number of custom "ds-" HTML-like tags which make displaying individual metadata fields
easier. These tags make it easier to add/remove metadata fields on this page.
a. <ds-generic-item-page-field> - This tag can be used to display the value of any metadata field (as a string).
i. Put the name of the metadata field in the "[fields]" attribute... see existing fields as an example.
ii. Put the i18n label you want to use for this field in the "[label]" attribute. Again, see existing fields as an example. This i18n tag
MUST then be added to your "src/assets/i18n/en.json5" file (or corresponding file depending on your language)
iii. For example, here's a new ds-generic-item-page-field which displays the value of the "dc.title.alternative" field with a label
defined in the "
b. <ds-item-page-uri-field> - This tag can be used to display the value of any metadata field as an HTML link. (The value must
already be a URL)
i. This field has the same attributes at <ds-generic-item-page-field> above.
c. Some specific tags exist for other fields (e.g. <ds-item-page-date-field> and <ds-item-page-abstract-field>). These are
currently separate components under "src/app/item-page/simple/field-components/specific-field/" directories. They
are hardcoded to use a specific metadata field and label, but you could customize the components in that location.
5. To add new fields, just add new "<ds-generic-item-page-field>" tags (or similar). To remove fields, just comment them out or remove the
HTML. You can also restructure the columns on that page using simple HTML and Bootstrap CSS.
6. Any changes require rebuilding your UI. If you are running in "dev mode" (yarn start:dev), then the UI will restart automatically whenever changes
are detected.
NOTE: If your changes to the simple item page don't appear to be working, make sure that you have updated the eager-theme.module.ts to load your
custom theme as described in the "Getting Started" section above! This small change is REQUIRED for the untyped-item component to work in a custom
theme. You also may wish to restart your dev server to ensure nothing is being cached.
1. Configure your theme to use its copies of files: Modify the corresponding *.component.ts in your theme.
a. If you want to modify component style, replace the "styleUrls" in that file to point at the copy of *.component.scss in your theme.
b. If you want to modify component HTML, replace the "template" in that file to point at the copy of *.component.html in your theme.
2. Copy the default UI code into your theme file(s)
a. If you want to modify component style, copy the default *.component.scss code (from src/app/) into your theme's component.
scss file.
b. If you want to modify component HTML, copy the default *.component.html code (from src/app/) into your theme's component.
html file.
3. Modify those theme-specific files
a. If you want to add images, add them to your theme's assets/images folder. Then reference them at the /assets/[theme-name]
/images/ URL path.
b. Keep in mind, all Bootstrap variables, utility classes & styles can be used in these files. Take advantage of Bootstrap when you can do
so.
4.
369
4. Remember to either rebuild the UI after each change, or run in dev mode (yarn start:dev) while you are doing theme work.
If you would like to change the text displayed in the UI, you will need to edit the i18n translation files. There are two approaches you can take:
The following "theme override" approach to capture i18n changes within a theme is only supported in DSpace 7.1 or above.
While editing the default i18n files directly is effective, the recommended approach is to capture i18n changes in your theme. This ensures that your
changes to the default values are easy to find and review and also removes the risk of losing your changes when upgrading to newer versions of DSpace.
There is an example of this configuration in the custom theme, which you can find in src/themes/custom/assets/i18n .
Once you have changes in place within your theme, they need to be applied by executing a script:
The merge-i18n script will merge the changes captured in your theme with the default settings, resulting in updated versions of the default i18n files. Any
setting you included in your theme will override the default setting. Any new properties will be added. Files will be merged based on file name, so en.json5
in your theme will be merged with the en.json5 file in the default i18n directory.
Themes can extend other themes using the "extends" configuration. See User Interface Configuration for more examples.
Extending another theme means that you inherit all the settings of the extended theme. So, if the current theme does NOT specify a component style, its
ancestor theme(s) will be checked recursively for their styles before falling back to the default. In other words, this "extends" setting allows for a theme to in
herit all styles/components from the extended theme, and only override those styles/components you wish to override.
themes:
# grandchild theme
- name: custom-A
extends: custom-B
handle: '10673/34'
# child theme
- name: custom-B
extends: custom
handle: 10673/2
# default theme
- name: custom
370
Format for 7.1 or 7.0 (environment.*.ts)
themes: [
// grandchild theme
{
name: 'custom-A',
extends: 'custom-B',
handle: '10673/34',
},
// child theme
{
name: 'custom-B',
extends: 'custom',
handle: '10673/2',
},
// default theme
{
name: 'custom',
},
],
When the object at Handle '10673/2' (and any child objects) is viewed, the 'custom-B' theme will be used. By default, you'll have the same styles
as the extended 'custom' theme. However, you can override individual styles in your 'custom-B' theme.
When the object at Handle '10673/34' (and any child objects) is viewed, the 'custom-A' theme will be used. By default, your overall theme will be
based on the 'custom' theme (in this case a "grandparent" theme). But, you can override those styles in your 'custom-B' theme or 'custom-A'
theme.
The order of priority is 'custom-A', then 'custom-B', then 'custom'. If a style/component is in 'custom-A' it will be used. If not, 'custom-B'
will be checked and if it's there, that version will be used. If not in either 'custom-A' or 'custom-B', then the style/component from
'custom' will be used. If the style/component is not in ANY of those themes, then the default (base theme) style will be used.
1. First, copy the Angular Component directory in question from the "Custom" theme folder (src/themes/custom) into your theme's folder. NOTE: at
this time, not all components are theme-able. So, if it doesn't exist in the "Custom" theme folder, then it may not be possible to theme.
a. For example, if you wanted to add the Footer Component to your theme, it can be found in the "Custom" theme at "src/themes/custom
/app/footer".
b. Copy that entire folder into your theme folder, retaining the same relative path. For example, to add the Footer Component, copy "src
/themes/custom/app/footer" (and all contents) into "src/themes/[your-theme]/app/footer".
2. Now, you need to "register" that component in one of your theme's module files: lazy-theme.module.ts or eager-theme.module.ts. For
performance it's best to put as many components into lazy-theme.module.ts as that means they'll only be downloaded if they're needed.
Components in eager-theme.module.ts are included in the initial JS download for the app, so you should only add components there that are
necessary on every page, such as the header and footer, these should be added to the DECLARATIONS array. You should also include
components using one of our custom decorators (such as @listableObjectComponent), because those decorators need to be registered when the
app starts to be able to be picked up. These should be added to the ENTRY_COMPONENTS array, which will both declare them as well as ensure
they're loaded when the app starts.
3. Add an import of the new component file, or copy the corresponding import from "src/themes/custom/lazy-theme.module.ts" or "src/themes
/custom/eager-theme.module.ts". For example, the Footer Component import can be found in "src/themes/custom/eager-theme.module.ts" and
looks like this:
4. In that same module file, also add this imported component to the "DECLARATIONS" section. (Again, you can optionally look in the custom
theme's module files to see how its done). For example, the Footer Component would then be added to the list of DECLARATIONS (the order of
the declarations list doesn't matter):
const DECLARATIONS = [
....
FooterComponent,
....
];
5.
371
5. At this point, you should rebuild/restart your UI to ensure nothing has broken. If you did everything correctly, no build errors will occur. Generally
speaking, it's best to add Components one by one, rebuilding in between.
6. Now, you can customize your newly added Component by following the "Customizing Other Components in your Theme" instructions above.
The main advantage to keeping your theme simple/small is that it can make future upgrades easier. Generally speaking, the fewer components you have
in your theme, the less likely your theme will need modification in a future upgrade (as generally your theme may require updates if one of the components
it references underwent structural/major changes).
1. First you MUST remove all references to that directory/component from your theme's lazy-theme.module.ts and eager-theme.module.ts
files.
a. For example, to delete the "./app/login-page" directory, you'd want to find which component(s) use that directory in your lazy-theme.
module.ts file.
b. If you search that file, you'd fine this reference:
c. That means you not only need to remove that "import" statement. You'd also need to remove all other references to
"LoginPageComponent" in that same lazy-theme.module.ts file. So, you'd also need to remove it from the DECLARATIONS
section:
const DECLARATIONS = [
....
LoginPageComponent,
....
];
Simply view the HTML source of the page, and look for the "data-used-theme" attribute. This attribute will tell you which named theme matched that HTML
element. By default, a name of "base" references the core or "base" code ( under ./src/app) was used.
For example:
HTML source
<!-- This example shows the theme named "dspace" was used for the "themed-header-navbar-wrapper.component.ts" --
>
<ds-themed-header-navbar-wrapper ... data-used-theme="dspace"></ds-themed-header-navabar-wrapper>
<main>
<!-- But, on the same page, the theme named "base" (core code) was used for the "themed-breadcrumbs.
component.ts" -->
<ds-themed-breadcrumbs ... data-used-theme="base"></ds-themed-breadcrumbs>
</main>
372
"Getting Started with DSpace 7.0" Basic Workshop at OR2021 Conference
Bootstrap Documentation - DSpace's UI strives to be compliant with "out-of-the-box" Bootstrap as much as possible. Therefore, Bootstrap
knowledge is very beneficial in customizing DSpace.
Sass Documentation - Bootstrap and DSpace both use Sass to enhance your ability to customize styles quickly via variables, etc. Some
familiarity with Sass is recommended, though you need not be an expert.
373
User Interface Debugging
This page provides some basic guidelines for debugging issues that may occur in the User Interface when following the User Interface Customization
guide or similar.
How to find what the User Interface is doing when you click something
Finding which component is generating the content on a page
How to find what the User Interface is doing when you click something
HINT: This is very similar to our Troubleshoot an error guide.
If you want to determine what action the User Interface is taking when you click a button/link, you can find that information in your browser's Developer
Tools.
So, supposing you are trying to determine which component is generating part of a DSpace page.
1. View the HTML source of the page in your browser. Search for that section of the page. (Or, right click on that part of the page and select
"Inspect")
a. For example, on the homepage view the source of the "Communities in DSpace" heading
2. Look for a parent HTML tag that begins with "ds-". This is the component selector!
a. Continuing the example, if you view the source of the "Communities in DSpace" heading, you'll see something like this (all HTML
attributes have been removed to make the example more simple):
<ds-top-level-community-list>
<div>
<h2> Communities in DSpace </h2>
<p>Select a community to browse its collections.</p>
</div>
</ds-top-level-community-list>
b. Based on the above HTML source, you can see that the "Communities in DSpace" header/content is coming from a component who's
selector is "ds-top-level-community-list"
3. Now, search the source code (./src/app/) directories for a ".component.ts" file which includes that "ds-" tag name. This can most easily be done in
an IDE, but also could be done using command line tools (e.g. grep like this).
a. Continuing the example, if you search the ./src/app/ directories for "ds-top-level-community-list" you'll find a match in the "src/app
/home-page/top-level-community-list/top-level-community-list.component.ts" file:
@Component({
selector: 'ds-top-level-community-list',
...
})
b. This lets you know that to modify the display of that section of the page, you may need to edit either the "top-level-community-
list.component.ts" file or it's corresponding HTML file at "top-level-community-list.component.html"
4. Once you've located the component, you can edit that component's HTML file (ending in "component.html") to change that section of the page.
a. Keep in mind, the component's HTML file may reference other "ds-" tags! Those are other components in DSpace which you can find
again by searching the "./src/app" directories for that tag.
374
Accessibility
Accessibility Statement
Conformance status
How we test for accessibility
Known limitations
Report accessibility issues
Accessibility Statement
DSpace is an international, open-source digital repository application that aspires to be as inclusive as possible for all users, including people with
disabilities. As a community of users and developers who build and maintain this application, we are dedicated to creating an accessible and interoperable
user interface. We are guided by the recommendations of the Web Content Accessibility Guidelines (WCAG) and we continually strive to meet and
exceed these standards.
Conformance status
The Web Content Accessibility Guidelines (WCAG) defines requirements for designers and developers to improve accessibility for people with disabilities.
It defines three levels of conformance: Level A, Level AA, and Level AAA.
DSpace strives to conform with the current version of WCAG level AA. However, we acknowledge that achieving full accessibility is a work-in-
progress at this time.
We use design principles and coding standards informed by accessibility concerns as documented in User Interface Design Principles &
Accessibility.
We run automated accessibility scanning tools (Axe by Deque) across the user interface in our end-to-end tests (run via Cypress). These
automated tests run for every GitHub pull request submitted to our user interface codebase.
We ask institutions who use DSpace to share any of their own accessibility testing results with DSpace developers. Accessibility issues
discovered are turned into bug tickets for developers to address in upcoming DSpace releases.
If your institution has accessibility testing results to share, please contact Tim Donohue or anyone on our DSpace Steering Group.
In 2021, we conducted an accessibility audit of the DSpace application with Deque to get specific feedback on our accessibility conformance.
Their feedback has guided our design and coding standards mentioned above.
Known limitations
Despite our best efforts to ensure accessibility of DSpace, there may be some limitations. Below is a description of known limitations:
1. We track all known DSpace accessibility issues in our GitHub issue tracker with the "accessibility" label.
2. DSpace development is primarily volunteer-based, and therefore some accessibility tickets may be waiting on a volunteer to claim them. While
we do our best to ensure critical issues are addressed quickly, non-critical issues may not receive attention until a volunteer gets to them. We
accept code contributions from anyone (in the form of GitHub Pull Requests).
a. If an issue is important to you and you have developers on staff (or can hire a service provider), please consider contributing a fix back to
DSpace. Please claim open tickets by commenting on the issue ticket - this ensures that no other institutions will duplicate efforts.
3. Since the DSpace User Interface allows users to upload content, we cannot ensure the accessibility of user contributions. DSpace has some
features that allow administrators to make uploaded content more accessible, but some limitations exist
a. The MediaViewer (used to view video/audio content) supports subtitles/captioning. However, at this time, the WebVTT captioning files mu
st be uploaded separately alongside the original video.
b. At this time, DSpace does not support custom alternative text (alt text) for either thumbnail images (generated from uploaded files) or
Community/Collection logos.
What is the accessibility issue you've found? If you know of a way to fix the issue, please include it as well.
Which page(s) of the DSpace web application can this issue be found on? For example, provide the URL of the page or a description of how to
get to that page.
How could someone reproduce this issue? For example, what tool or browser plugin did you use when you found this issue? If the issue is
browser-specific, also note which browser(s) are affected.
If possible, provide links/screenshots to document the issue or potential fixes. This might include a screenshot showing the issue, a link to WCAG
describing the issue or a description from an internal accessibility audit.
We also welcome contributions / accessibility fixes from anyone. If you've found a way to fix the issue, please submit a GitHub pull request to our codebase
. Service providers are also available for hire to fix issues and donate them back to the DSpace codebase.
375
376
Browse
Browse By Subject Category
You can search for specific values by using the search bar on top of the tree and clicking "Search". Clicking "Reset" will not only reset the tree itself, but
also the values you previously selected.
After you're done (de)selecting values, click "Browse". This will redirect you to the search page, where your selected values are used as search filters:
377
If one value was selected, the search results will consist of every item which has that value in their dc.subject metadata field.
If multiple values were selected, the search results will consist of the items which have all of the values in their dc.subject metadata field. (E.g.
if you selected TECHNOLOGY and MEDICINE, only items with both subjects will show up.)
To configure Browse by Subject Category options, see "Hierarchical Browse Indexes" in the Configuration Reference.
378
Discovery
Although these techniques are new in DSpace, they might feel familiar from other platforms like Aquabrowser or Amazon, where facets help you to select
the right product according to facets like price and brand. DSpace Discovery offers very powerful browse and search configurations that were only possible
with code customization in the past.
Since 6.0, Discovery is the only out-of-the-box Search and Browse infrastructure provided in DSpace.
When you have successfully enabled Discovery in your DSpace, you will notice that the different enabled facets are visualized in a "Discover" section in
your sidebar, by default, right below the Browse options.
379
In this example, there are 3 Sidebar Facets: Author, Subject and Date Issued. It's important to know that multiple metadata fields can be included in one
facet. For example, the Author facet above includes values from both dc.contributor.author as well as dc.creator.
Another important property of Sidebar Facets is that their contents are automatically updated to the context of the page. On collection homepages or
community homepages it will include information about the items included in that particular collection or community.
In a faceted search, a user can modify the list of displayed search results by specifying additional "filters" that will be applied on the list of search results. In
DSpace, a filter is a contain condition applied to specific facets. In the example below, a user started with the search term "health", which yielded 500
results. After applying the filter "public" on the facet "Subject", only 227 results remain. Each time a user selects a sidebar facet it will be added as a filter.
Active filters can be altered or removed in the 'filters' section of the search interface.
Another example: Using the standard search, a user would search for something like [wetland + "dc.author=Mitsch, William J" + dc.subject="water
quality" ]. With filtered search, they can start by searching for [wetland ], and then filter the results by the other attributes, author and subject.
380
This is a classic "tag cloud" facet in a DSpace repository.
Configuration files
The configuration for discovery is located in 2 separate files.
Pr discovery.search.server
op
er
ty:
E discovery.search.server=[http://localhost:8080/solr/search]
xa
m
pl
e
V
al
ue:
Inf Discovery relies on a Solr index for storage and retrieval of its information. This parameter determines the location of the Solr index.
or
m If you are uncertain whether this property is set correctly, you can use a commandline tool like "wget" to perform a query against the Solr index
ati (and ensure Solr responds). For example, the below query searches the Solr index for "test" and returns the response on standard out:
on
al wget -O - http://localhost:8080/solr/search/select?q=test
N
ot
e:
Pr discovery.index.authority.ignore[.field]
op
er
ty:
E discovery.index.authority.ignore=true
xa
m discovery.index.authority.ignore.dc.contributor.author=false
pl
e
V
al
ue:
Inf By default, Discovery will use the authority information in the metadata to disambiguate homonyms. Setting this property to false will make the
or indexing process the same as the metadata doesn't include authority information. The configuration can be different on a field (<schema>.
m <element>.<qualifier>) basis. Setting the property without a field will change the default value.
ati
on
al
N
ot
e:
Pr discovery.browse.authority.ignore[.browse-index]
op
er
ty:
381
E discovery.browse.authority.ignore=true
xa
m discovery.browse.authority.ignore.author=false
pl
e
V
al
ue:
Inf Similar property to "discovery.index.authority.ignore", except specific to the "Browse By" indexes. By default, Discovery will use the authority
or information in the metadata to disambiguate homonyms. Setting this property to false will make the indexing process the same as the metadata doe
m sn't include authority information. The configuration can be different on a browse index basis. Setting the property without a browse index will
ati change the default value.
on
al
N
ot
e:
Pr discovery.index.authority.ignore-prefered[.field]
op
er
ty:
E discovery.index.authority.ignore-prefered=true
xa
m discovery.index.authority.ignore-prefered.dc.contributor.author=false
pl
e
V
al
ue:
Inf By default, Discovery will use the authority information in the metadata to query the authority for the preferred label. Setting this property to false
or will make the indexing process the same as the metadata doesn't include authority information (i.e. the preferred form is the one recorded in the
m metadata value). The configuration can be different on a field (<schema>.<element>.<qualifier>) basis. Setting the property without a field will
ati change the default value. If the authority is a remote service, disabling this feature can greatly improve performance.
on
al
N
ot
e:
Pr discovery.browse.authority.ignore-prefered[.browse-index]
op
er
ty:
E discovery.browse.authority.ignore-prefered=true
xa
m discovery.browse.authority.ignore-prefered.author=false
pl
e
V
al
ue:
Inf Similar property to "discovery.index.authority.ignore-prefered", except specific to the "Browse By" indexes. By default, Discovery will use the
or authority information in the metadata to query the authority for the preferred label. Setting this property to false will make the indexing process the
m same as the metadata doesn't include authority information (i.e. the preferred form is the one recorded in the metadata value). The configuration
ati can be different on a browse index basis. Setting the property without a browse index will change the default value. If the authority is a remote
on service, disabling this feature can greatly improve performance.
al
N
ot
e:
Pr discovery.index.authority.ignore-variants[.field]
op
er
ty:
382
E discovery.index.authority.ignore-variants=true
xa
m discovery.index.authority.ignore-variants.dc.contributor.author=false
pl
e
V
al
ue:
Inf By default, Discovery will use the authority information in the metadata to query the authority for variants. Setting this property to false will make
or the indexing process the same, as the metadata doesn't include authority information. The configuration can be different on a per-field (<schema>.
m <element>.<qualifier>) basis. Setting the property without a field will change the default value. If authority is a remote service, disabling this
ati feature can greatly improve performance.
on
al
N
ot
e:
Pr discovery.browse.authority.ignore-variants[.browse-index]
op
er
ty:
E discovery.browse.authority.ignore-variants=true
xa
m discovery.browse.authority.ignore-variants.author=false
pl
e
V
al
ue:
Inf Similar property to "discovery.index.authority.ignore-variants", except specific to the "Browse By" indexes. By default, Discovery will use the
or authority information in the metadata to query the authority for variants. Setting this property to false will make the indexing process the same, as
m the metadata doesn't include authority information. The configuration can be different on a browse index basis. Setting the property without a
ati browse index will change the default value. If authority is a remote service, disabling this feature can greatly improve performance.
on
al
N
ot
e:
If you add new browse fields then you should coordinate changes here with the message catalog(s) in the UI. You will need to add several entries:
key value
browse.comcol.by.* label the browse field in the Browse menu of a community or collection page
menu.section.browse_global_by_* label the browse field in the "browse" dropdown at the top of the page
browse.metadata.* label the browse field in the body of the browsing page
If you add new search fields, sorts, etc. then you should coordinate changes here with the message catalog(s) in the UI. You will need to add e.g. search
.filters.filter.NAME.head (where NAME is the indexFieldName of the field definition) to label a new search field.
Structure Summary
This file is in XML format. You should be familiar with XML before editing this file. The configurations are organized together in beans, depending on the
purpose these properties are used for.
This purpose can be derived from the class of the beans. Here's a short summary of classes you will encounter throughout the file and what the
corresponding properties in the bean are used for.
383
Download the configuration file and review it together with the following parameters
Clas DiscoveryConfigurationService
s:
Purp Defines the mapping between separate Discovery configurations and individual collections/communities
ose:
Defa All communities, collections and the homepage (key=default) are mapped to defaultConfiguration. Also controls the metadata fields that should
ult: not be indexed in the search core (item provenance for example).
Clas DiscoveryConfiguration
s:
Purp Groups configurations for sidebar facets, search filters, search sort options and recent submissions
ose:
Clas DiscoverySearchFilter
s:
Purp Defines that specific metadata fields should be enabled as a search filter
ose:
Defa dc.title, dc.contributor.author, dc.creator, dc.subject.* and dc.date.issued are defined as search filters
ult:
Clas DiscoverySearchFilterFacet
s:
Purp Defines which metadata fields should be offered as a contextual sidebar browse options, each of these facets has also got to be a search filter
ose:
Clas HierarchicalSidebarFacetConfiguration
s:
Purp Defines which metadata fields contain hierarchical data and should be offered as a contextual sidebar option
ose:
Clas DiscoverySortConfiguration
s:
Defa dc.title and dc.date.issued are defined as alternatives for sorting, other than Relevance (hard-coded)
ult:
Clas DiscoveryHitHighlightingConfiguration
s:
Purp Defines which metadata fields can contain hit highlighting & search snippets
ose:
Defa dc.title, dc.contributor.author, dc.subject, dc.description.abstract & full text from text files.
ult:
Clas TagCloudFacetConfiguration
s:
Purp Defines the tag cloud appearance configuration bean and the search filter facets to appear in the tag cloud form. You can have different "TagClo
ose: udFacetConfiguration" per community or collection or the home page
Default settings
In addition to the summarized descriptions of the default values, following details help you to better understand these defaults. If you haven't already done
so, download the configuration file and review it together with the following parameters.
The file contains one default configuration that defines following sidebar facets, search filters, sort fields and recent submissions display:
Sidebar facets
searchFilterAuthor: groups the metadata fields dc.contributor.author & dc.creator with a facet limit of 10, sorted by occurrence count
searchFilterSubject: groups all subject metadata fields (dc.subject.*) with a facet limit of 10, sorted by occurrence count
searchFilterIssued: contains the dc.date.issued metadata field, which is identified with the type "date" and sorted by specific date values
384
Search filters
searchFilterTitle: contains the dc.title metadata field
searchFilterAuthor: contains the dc.contributor.author & dc.creator metadata fields
searchFilterSubject: contains the dc.subject.* metadata fields
searchFilterIssued: contains the dc.date.issued metadata field with the type "date"
Sort fields
sortTitle: contains the dc.title metadata field
sortDateIssued: contains the dc.date.issued metadata field, this sort has the type date configured.
defaultFilterQueries
The default configuration contains no defaultFilterQueries
The default filter queries are disabled by default but there is an example in the default configuration in comments which allows discovery
to only return items (as opposed to also communities/collections).
Recent Submissions
The recent submissions are sorted by dc.date. accessioned which is a date and a maximum number of 5 recent submissions are
displayed.
Hit highlighting
The fields dc.title, dc.contributor.author & dc.subject can contain hit highlighting.
The dc.description.abstract & full text field are used to render search snippets.
Non indexed metadata fields
Community/Collections: dc.rights (copyright text)
Items: dc.description.provenance
Many of the properties contain lists that use references to point to the configuration elements. This way a certain configuration type can be used in multiple
discovery configurations so there is no need to duplicate them.
<property name="toIgnoreMetadataFields">
<map>
<entry>
<key><util:constant static-field="org.dspace.core.Constants.COMMUNITY"/></key>
<list>
<!--Introduction text-->
<!--<value>dc.description</value>-->
<!--Short description-->
<!--<value>dc.description.abstract</value>-->
<!--News-->
<!--<value>dc.description.tableofcontents</value>-->
<!--Copyright text-->
<value>dc.rights</value>
<!--Community name-->
<!--<value>dc.title</value>-->
</list>
</entry>
<entry>
<key><util:constant static-field="org.dspace.core.Constants.COLLECTION"/></key>
<list>
<!--Introduction text-->
<!--<value>dc.description</value>-->
<!--Short description-->
<!--<value>dc.description.abstract</value>-->
<!--News-->
<!--<value>dc.description.tableofcontents</value>-->
<!--Copyright text-->
<value>dc.rights</value>
<!--Collection name-->
<!--<value>dc.title</value>-->
</list>
</entry>
<entry>
<key><util:constant static-field="org.dspace.core.Constants.ITEM"/></key>
<list>
<value>dc.description.provenance</value>
</list>
</entry>
</map>
</property>
385
By adding additional values to the appropriate lists additional metadata can be excluded from the search core, a reindex is required after altering this file to
ensure that the values are removed from the index.
The id & class attributes are mandatory for this type of bean. The properties that it contains are discussed below.
indexFieldName (Required): A unique search filter name, the metadata will be indexed in Solr under this field name.
metadataFields (Required): A list of the metadata fields that need to be included in the facet.
Sidebar facets extend the search filter and add some extra properties to it. Below is an example of a search filter that is also used as a sidebar facet.
Note that the class has changed from DiscoverySearchFilter to DiscoverySerachFilterFacet. This is needed to support the extra properties.
facetLimit (optional): The maximum number of values to be shown by default. This property is optional, if none is specified the default value "10"
will be used. If the filter has the type date, this property will not be used since dates are automatically grouped together.
sortOrder (optional):The sort order for the sidebar facets, it can either be COUNT or VALUE. The default value is COUNT.
COUNT Facets will be sorted by the number of times they appear in the repository
VALUE Facets will be sorted alphabetically
type (optional): the type of the sidebar facet it can either be "date" or "text". "text" is the default value.
text: The facets will be treated as is (DEFAULT)
date: Only the year will be stored in the Solr index. These years are automatically displayed in ranges that get smaller when you select
one.
386
<bean id="searchFilterSubject" class="org.dspace.discovery.configuration.HierarchicalSidebarFacetConfiguration">
<property name="indexFieldName" value="subject"/>
<property name="metadataFields">
<list>
<value>dc.subject</value>
</list>
</property>
<property name="sortOrder" value="COUNT"/>
<property name="splitter" value="::"/>
<property name="skipFirstNodeLevel" value="false"/>
</bean>
Note that the class has changed from SidebarFacetConfiguration to HierarchicalSidebarFacetConfiguration. This is needed to support the extra
properties.
The id and class attributes are mandatory for this type of bean. The properties that it contains are discussed below.
DiscoveryConfiguration
The DiscoveryConfiguration groups configurations for sidebar facets, search filters, search sort options and recent submissions. If you want to show the
same sidebar facets, use the same search filters, search options and recent submissions everywhere in your repository, you will only need one
DiscoveryConfiguration and you might as well just edit the defaultConfiguration.
The DiscoveryConfiguration makes it very easy to use custom sidebar facets, search filters, ... on specific communities or collection homepage. This is
particularly useful if your collections are heterogeneous. For example, in a collection with conference papers, you might want to offer a sidebar facet for
conference date, which might be more relevant than the actual issued date of the proceedings. In a collection with papers, you might want to offer a facet
for funding bodies or publisher, while these fields are irrelevant for items like learning objects.
Below is an example of how one of these lists can be configured. It's important that each of the bean references corresponds to the exact name of the
earlier defined facets, filters or sort options.
Each sidebar facet must also occur in the list of the search filters.
387
<property name="sidebarFacets">
<list>
<ref bean="sidebarFacetAuthor" />
<ref bean="sidebarFacetSubject" />
<ref bean="sidebarFacetDateIssued" />
</list>
</property>
<property name="searchSortConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoverySortConfiguration">
<!--<property name="defaultSort" ref="sortDateIssued"/>-->
<!--DefaultSortOrder can either be desc or asc (desc is default)-->
<property name="defaultSortOrder" value="desc"/>
<property name="sortFields">
<list>
<ref bean="sortTitle" />
<ref bean="sortDateIssued" />
</list>
</property>
</bean>
</property>
The property name & the bean class are mandatory. The property field names are discusses below.
defaultSort (optional): The default field on which the search results will be sorted. This must be a reference to an existing search sort field bean.
If none is given relevance will be the default. Sorting according to the internal relevance algorithm is always available, even though it's not
explicitly mentioned in the sortFields section.
defaultSortOrder (optional): The default sort order can either be asc or desc.
sortFields (mandatory): The list of available sort options, each element in this list must link to an existing sort field configuration bean.
<property name="defaultFilterQueries">
<list>
<value>query1</value>
<value>query2</value>
</list>
</property>
This property contains a simple list which in turn contains the queries. Some examples of possible queries:
search.resourcetype:2
dc.subject:test
dc.contributor.author: "Van de Velde, Kevin"
...
388
If the "Anonymous" group has "READ" access on the Item, then anonymous/public users will be able to view that Item's metadata and locate that Item via
DSpace's search/browse system. In addition, search engines will also be able to index that Item's metadata. However, even with Anonymous READ set at
the Item-level, you may still choose to access-restrict the downloading/viewing of files within the Item. To do so, you would restrict "READ" access on
individual Bitstream(s) attached to the Item.
If the "Anonymous" group does NOT have "READ" access on the Item, then anonymous users will never see that Item appear within their search/browse
results (essentially the Item is "invisible" to them). In addition, that Item will be invisible to search engines, so it will never be indexed by them. However,
any users who have been given READ access will be able to find/locate the item after logging into DSpace. For example, if a "Staff" group was provided
"READ" access on the Item, then members of that "Staff" group would be able to locate the item via search/browse after logging into DSpace.
If you prefer to allow all access-restricted or embargoed Items to be findable within your DSpace, you can choose to turn off Access Rights
Awareness. However, please be aware that this means that restricting "READ" access on an Item will not really do anything – the Item metadata will be
available to the public no matter what group(s) were given READ access on that Item.
This feature can be switched off by going to the [dspace.dir]/config/spring/api/discovery.xml file & commenting out the bean & the alias
shown below.
The Browse Engine only supports the "Access Rights Awareness" if the Solr/Discovery backend is enabled (see Defining the Storage of the Browse Data).
However, it is enabled by default for DSpace 3.x and above.
When searching in discovery all the groups the user belongs to will be added as a filter query as well as the users identifier. If the user is an admin all
items will be returned since an admin has read rights on everything.
<property name="recentSubmissionConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoveryRecentSubmissionsConfiguration">
<property name="metadataSortField" value="dc.date.accessioned"/>
<property name="type" value="date"/>
<property name="max" value="5"/>
</bean>
</property>
The property name and the bean class are mandatory. The property field names are discusses below.
metadataSortField (mandatory): The metadata field to sort on to retrieve the recent submissions
max (mandatory): The maximum number of results to be displayed as recent submissions
type (optional): the type of the search filter. It can either be date or text, if none is defined text will be used.
389
Disabling hit highlighting / search snippets
You can disable hit highlighting / search snippets by commenting out the entire <property name="hitHighlightingConfiguration">
Configuration in the [dspace]/config/spring/api/discovery.xml configuration file.
PLEASE BE AWARE there are two sections where this <property> definition exists. You should comment out both. One is under the <bean id="
defaultConfiguration"> and one is under the <bean id="homepageConfiguration">
Alternatively, you may also choose to tweak which fields are shown in hit highlighting, or modify the number of matching words shown (snippets) and/or
number of characters shown around the matching word (maxSize).
For this change to take effect in the User Interface, you will need to restart Tomcat.
Changes made to the configuration will not automatically be displayed in the user interface. By default, only the following fields are displayed: dc.title, dc.
contributor.author, dc.creator, dc.contributor, dc.date.issued, dc.publisher, dc.description.abstract and fulltext.
<property name="hitHighlightingConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightingConfiguration">
<property name="metadataFields">
<list>
<bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
<property name="field" value="dc.title"/>
<property name="snippets" value="5"/>
</bean>
<bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
<property name="field" value="dc.contributor.author"/>
<property name="snippets" value="5"/>
</bean>
<bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
<property name="field" value="dc.subject"/>
<property name="snippets" value="5"/>
</bean>
<bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
<property name="field" value="dc.description.abstract"/>
<!-- Max number of characters to display around the matching word (Warning setting to 0
returns entire field) -->
<property name="maxSize" value="250"/>
<!-- Max number of snippets (matching words) to show -->
<property name="snippets" value="2"/>
</bean>
<bean class="org.dspace.discovery.configuration.DiscoveryHitHighlightFieldConfiguration">
<!-- Displays snippets from indexed full text of document (for
supported formats) -->
<property name="field" value="fulltext"/>
<!-- Max number of characters to display around the matching word (Warning setting to 0
returns entire field) -->
<property name="maxSize" value="250"/>
<!-- Max number of snippets (matching words) to show -->
<property name="snippets" value="2"/>
</bean>
</list>
</property>
</bean>
</property>
The property name and the bean class are mandatory. The property field names are:
field (mandatory): The metadata field to be highlighted (can also be * if all the metadata fields should be highlighted).
maxSize (optional): Limit the number of characters displayed to only the relevant part (use metadata field as search snippet).
snippets (optional): The maximum number of snippets that can be found in one metadata field.
390
The rendering of search results is no longer handled by the METS format but uses a special type of list named "TYPE_DSO_LIST". Each metadata field (&
fulltext if configured) is added in the DRI and IF the field contains hit higlighting the Java code will split up the string & add DRI highlights to the list. The
XSL for the themes also contains special rendering XSL for the DRI; for Mirage, the changes are located in the discovery.xsl file. For themes using the old
themes based on structural.xsl, look for the template matching "dri:list[@type='dsolist']".
<property name="moreLikeThisConfiguration">
<bean class="org.dspace.discovery.configuration.DiscoveryMoreLikeThisConfiguration">
<property name="similarityMetadataFields">
<list>
<value>dc.title</value>
<value>dc.contributor.author</value>
<value>dc.creator</value>
<value>dc.subject</value>
</list>
</property>
<!--The minimum number of matching terms across the metadata fields above before an item is found as
related -->
<property name="minTermFrequency" value="5"/>
<!--The maximum number of related items displayed-->
<property name="max" value="3"/>
<!--The minimum word length below which words will be ignored-->
<property name="minWordLength" value="5"/>
</bean>
</property>
The property name and the bean class are mandatory. The property field names are discussed below.
The feature currently only one line of configuration to discovery.xml. Changing the value from true to false will disable the feature.
391
http://wiki.apache.org/solr/SpellCheckComponent
https://cwiki.apache.org/confluence/display/solr/Spell+Checking
Declare the bean (of class: TagCloudFacetConfiguration) that holds the configuration for the tag cloud facet.
The appearance configuration can have the following properties, as shown in the following bean:
392
<bean id="tagCloudConfiguration" class="org.dspace.discovery.configuration.TagCloudConfiguration">
<!-- Should display the score of each tag next to it? Default: false -->
<property name="displayScore" value="true"/>
<!-- Should display the tag as center aligned in the page or left aligned? Possible values: true
| false. Default: true -->
<property name="shouldCenter" value="true"/>
<!-- How many tags will be shown. Value -1 means all of them. Default: -1 -->
<property name="totalTags" value="-1"/>
<!-- The letter case of the tags.
Possible values: Case.LOWER | Case.UPPER | Case.CAPITALIZATION | Case.PRESERVE_CASE |
Case.CASE_SENSITIVE
Default: Case.PRESERVE_CASE -->
<property name="cloudCase" value="Case.PRESERVE_CASE"/>
<!-- If the 3 CSS classes of the tag cloud should be independent of score (random=yes) or based
on the score. Possible values: true | false . Default: true-->
<property name="randomColors" value="true"/>
<!-- The font size (in em) for the tag with the lowest score. Possible values: any decimal.
Default: 1.1 -->
<property name="fontFrom" value="1.1"/>
<!-- The font size (in em) for the tag with the lowest score. Possible values: any decimal.
Default: 3.2 -->
<property name="fontTo" value="3.2"/>
<!-- The score that tags with lower than that will not appear in the rag cloud. Possible values:
any integer from 1 to infinity. Default: 0 -->
<property name="cuttingLevel" value="0"/>
<!-- The distance (in px) between the tags. Default: 5 -->
<property name="marginRight" value="5"/>
<!-- The ordering of the tags (based either on the name or the score of the tag)
Possible values: Tag.NameComparatorAsc | Tag.NameComparatorDesc | Tag.ScoreComparatorAsc
| Tag.ScoreComparatorDesc
Default: Tag.NameComparatorAsc -->
<property name="ordering" value="Tag.NameComparatorAsc"/>
</bean>
When tagCloud is rendered there are some CSS classes that you can change in order to change the appearance of the tag cloud.
Class Note
<ref bean="searchFilterContentInOriginalBundle"/>
393
Java class: org.dspace.discovery.IndexClient
Arguments Description
(short and
long forms):
-c clean existing index removing any documents that no longer exist in the db
-i <objec Reindex an individual object (and any child objects). When run on an Item, it just reindexes that single Item. When run on a Collection, it
t handle> reindexes the Collection itself and all Items in that Collection. When run on a Community, it reindexes the Community itself and all sub-
Communities, contained Collections and contained Items.
It is recommended to run maintenance on the Discovery Solr index occasionally (from crontab or your system's scheduler), to prevent your servlet
container from running out of memory:
[dspace]/bin/dspace index-discovery
solr
search
conf
protwords.txt
schema.xml
solrconfig.xml
stopwords.txt
synonyms.txt
|
...
statistics
conf
protwords.txt
schema.xml
solrconfig.xml
stopwords.txt
synonyms.txt
394
Contextual Help Tooltips
Available in 7.5 or later.
Contextual help tooltips are a feature to provide additional information about how to use DSpace to less experienced users without cluttering the interface
for more advanced users who do not need additional instruction.
User perspective
Adding new tooltips
User perspective
If the user visits a page where contextual help tooltips are available, a "toggle context help" button appears in the header, in between the language switch
menu and user profile menu. Clicking this button toggles the visibility of the tooltips on the page (by default, they are invisible).
When tooltip visibility is turned on, similar looking buttons appear on the page where ever a tooltip is available.
Clicking any of these buttons makes a text bubble appear containing the contextual help; clicking anywhere outside of the bubble makes it disappear again.
The mandatory `content` field represents a key in i18n files (src/assets/i18n/*.json5). You will need to add a new key to this file to store the help
text.
`id` should be a unique identifier for this tooltip, to distinguish it from other tooltips on the page.
`tooltipPlacement` (optional) determines where the text bubble appears relative to the help button. Its type is an array of Placements; see the
ng-bootstrap documentation for more information.
`iconPlacement` (optional) should be assigned either 'left' or 'right', and determines whether the tooltip will be
placed on the left or on the right of the element.
This is what the template looks like for the "Edit group" example in the "User Perspective" picture above:
395
<h2 class="border-bottom pb-2">
<span
*dsContextHelp="{
content: 'admin.access-control.groups.form.tooltip.editGroupPage',
id: 'edit-group-page',
iconPlacement: 'right',
tooltipPlacement: ['right', 'bottom']
}"
>
{{messagePrefix + '.head.edit' | translate}}
</span>
</h2>
Note the use of the `span` tags: setting `*dsContextHelp` directly on the `h2` element makes the help button appear all the way on the right of
the page, instead of directly to the right of the "Edit group" text.
The 'content' field maps to the i18n key which is used to display the help text. This i18n key's value may include markdown-style links (only). At
this time, other formatting is not supported. To display a link, use the following markdown syntax in your i18n value:
396
IIIF Configuration
Overview
Format Support
Enable IIIF Support on Backend
Install a IIIF Image Server
Installing and Configuring Cantaloupe
Required IIIF Configuration
Additional Configuration Options
CORS Configuration
IIIF Search API
Enable/Install the Mirador Viewer on Frontend
Configuring Mirador
Configure IIIF viewer via Metadata Fields
Overview
Supported in 7.1 or above
IIIF support was first added to DSpace in version 7.1. It was not available in 7.0 or below.
DSpace supports the International Image Interoperability Framework (IIIF). The DSpace REST API implements the IIIF Presentation API version 2.1.1, IIIF
Image API version 2.1.1, and the IIIF Search API version 1.0 (experimental). The DSpace Angular frontend uses the Mirador 3.0 viewer.
Administrators can configure IIIF behavior at the Collection, Item, Bundle and Bitstream levels using metadata. To support additional sharing, viewing,
comparing, and annotating, DSpace can be configured to share IIIF metadata with external IIIF clients (see CORS Configuration). IIIF REST endpoints
implement the same security protocol as the primary REST API so that DSpace authorization policies are enforced for IIIF access as well.
Running IIIF in production requires an IIIF-compatible image server. You are free to use any compatible image server you choose. However, instructions
for configuring the Cantaloupe Image Server are included below. A preconfigured Cantaloupe image server can be started via docker-compose to simplify
evaluation and testing.
Format Support
Currently, DSpace only supports IIIF viewing of Image formats (any format whose MIME type starts with "image/*"). For example, PDF viewing is not
currently supported.
397
Enable IIIF Support on Backend
DSpace IIIF support is not enabled by default. To enable IIIF, you first need to install a IIIF Image Server, and then update your DSpace configuration as
described below.
Here is a brief overview of how the IIIF image server works with DSpace.
iiif.image.server = https://imageserver.mycampus.edu/image-server/cantaloupe/iiif/2/
Given this configuration, the IIIF manifest returned by the DSpace backend will include an image resource annotation like the following:
resource: {
@id: "https://imageserver.mycampus.edu/image-server/cantaloupe/iiif/2/4b415036-57a8-42f4-a971-
c5e982f55f92/full/full/0/default.jpg",
@type: "dctypes:Image",
service: {
@context: "http://iiif.io/api/image/2/context.json",
@id: "https://imageserver.mycampus.edu/image-server/cantaloupe/iiif/2/4b415036-57a8-42f4-a971-
c5e982f55f92",
profile: "http://iiif.io/api/image/2/level1.json",
protocol: "http://iiif.io/api/image"
},
format: "image/jp2"
}
The Mirador viewer (see below) uses this annotation to communicate with the image server using the IIIF Image API.
Finally, notice that the image server needs to retrieve the requested bitstream from DSpace. There are a number of ways to do this and the details vary
with the image server chosen. The easiest approach is for the image server to request the bitstream via HTTP and the DSpace API, e.g.:
http:/dspace.mycampus.edu:8080/server/api/core/bitstreams/4b415036-57a8-42f4-a971-c5e982f55f92/content
The simplest way to configure Cantaloupe to retrieve images from DSpace is to use HTTPSource with the following configuration.
HttpSource.BasicLookupStrategy.url_prefix = <dspace-url>/server/api/core/bitstreams/
HttpSource.BasicLookupStrategy.url_suffix = /content
iiif.enabled = true
In addition, you need to provide the URL for your newly installed IIIF image server. e.g.:
iiif.image.server = http://localhost:8182/iiif/2/
Finally, update dspace.cfg or your local.cfg file by adding "iiif" to the default event dispatcher, as shown below:
398
event.dispatcher.default.consumers = versioning, discovery, eperson, iiif
With these changes in place, DSpace will be ready to respond to IIIF requests. Restart your DSpace backend (i.e. Tomcat) for these changes to all take
effect.
Property Description
iiif.image.server Base URL path for the IIIF image server. e.g. http://localhost:8182/iiif/2/
iiif.document.viewing. Default viewing hint. Can be overridden with the metadata setting described below.
hint
iiif.logo.image Optional URL for a small image. This will be included in all IIIF manifests.
iiif.cors.allowed-origins Comma separated list of allowed CORS origins. The list must include the default value: ${dspace.ui.url}.
iiif.metadata.item Sets the Dublin Core metadata that will be added to the IIIF resource manifest. This property can be repeated.
iiif.metadata.bitstream Sets the Bitstream metadata that will be added to the IIIF canvas metadata for individual images. This property can be
repeated.
iiif.license.uri Sets the metadata used for information about the resource usage rights.
iiif.attribution The text to use as attribution in the iiif manifests. Defaults to: ${dspace.name}
iiif.document.viewing. Either "individuals", "paged" or "continuous". Can be overridden with the metadata setting described below.
hint
iiif.canvas.default-width Default value for the canvas size. Can be overridden at the item, bundle or bitstream level.
iiif.canvas.default-height Default value for the canvas size. Can be overridden at the item, bundle or bitstream level.
Canvas Dimensions
As of 7.2, the canvas dimension options (iiif.canvas.default-width and iiif.canvas.default-height) are updated with additional behaviors.
If you do not provide your own default dimensions in iiif.cfg, DSpace will attempt to optimize canvas dimensions when dimension metadata is
missing from the first bitstream in the item. This will often produce more accurate viewer layouts, but note that it is not sufficient to assure
accurate layouts in all cases.
If you decide to add your own default dimensions in iiif.cfg file your dimensions are used for every bitstream that lacks dimension metadata.
You may also set both default dimensions in iiif.cfg to the value -1. In this case, DSpace creates accurate default dimensions for every
bitstream that lacks dimension metadata. Note that this impacts performance.
It is recommended that iiif.image.width and iiif.image.height metadata be added to Item, Bundle, or Bitstream metadata to assure accurate layout and top
performance. Default dimension configurations are intended to improve the user experience when dimension metadata has not yet been added.
CORS Configuration
The wildcard "*" configuration is the default CORS setting for IIIF. With this setting, all remote viewers and applications can retrieve manifests, assuring
maximum interoperability. You can restrict CORS origins using the iiif.cors.allowed-origins property defined in iiif.cfg. Remove the wildcard
and add a comma-separated list of origins instead.
iiif.search.url = ${solr.server}/word_highlighting
iiif.search.plugin = org.dspace.app.rest.iiif.service.WordHighlightSolrSearch
Once you have successfully indexed ALTO files using the Solr plugin, you can enable search within a DSpace Item by adding the iiif.search.
enabled metadata field.
Indexing Support
Support for indexing OCR files using the the Solr OCR Highlighting Plugin or other services is not currently provided by DSpace. Institutions will need to
develop their own approach to indexing their data.
399
Enable/Install the Mirador Viewer on Frontend
The Mirador 3.0 viewer is included in the dspace-angular (UI) source code. Before enabling Mirador, be sure to review the instructions for installing the
Angular frontend if you haven't already.
To add the Mirador viewer to your DSpace frontend installation, run the following command:
# This builds and runs the DSpace UI with the Mirador Viewer in a single step
yarn run start:mirador:prod
This will build and copy Mirador to your dist/ directory and start the frontend server.
The actual steps for deploying the Angular UI with Mirador into Production will likely vary with your setup. However, one possible command-line scenario is
the following:
Running in Development
In the Dspace 7.1 release, the Mirador viewer cannot be used when running in development mode. For now, you need to use a production build.
Configuring Mirador
The Mirador viewer is highly configurable. The Mirador configuration file for DSpace includes a number of settings that you can override manually,
including CSS values for styling. Note that some of the Mirador behavior (like the inclusion of thumbnail navigation on the right) is set by the Angular
component at runtime. You can choose to override these runtime settings if you like.
Required Field
Note that the dspace.iiif.enabled metadata field MUST be added to the Item and set to "true". Otherwise, the Item display will use the default
DSpace view.
dspace iiif enabled Item Stores a boolean text value (true or false) to indicate if the iiif feature is enabled or not
for the dspace object. If absent the value is derived from the parent dspace object.
iiif label Bitstream Metadata field used to set the IIIF label associated with the canvas resource otherwise the
system
will derive one according to the configuration setting or the canvas.naming metadata field.
iiif description Item Metadata field used to set the IIIF description associated with the resource.
iiif canvas naming Item Metadata field used to set the base label used to name all the canvas in the Item. The canvas
label will be generated using the value of this metadata as prefix and the canvas position.
e.g. Page 1, Page 2, etc.
iiif viewing hint Item Metadata field used to set the viewing hint overriding the configuration value if any. Possible
values are "individuals" and "paged". Default value: individuals.
iiif image width Item, Bundle, or Bitstream Metadata field used to store the width of an image in pixels. Determines the canvas size.
iiif image height Item, Bundle, or Bitstream Metadata field used to store the height of an image in pixels. Determines the canvas size.
iiif toc Bitstream Metadata field used to set the position of the iiif resource in the "table of contents" structure.
iiif search enabled Item Metadata field used to enable the IIIF Search service at the item level. This feature is
experimental and requires additional setup.
400
Multilingual Support
DSpace supports a number of languages & you can even add your own translation. This may also be referred to as Localization (l10n) or
Internationalization (i18n).
According to the languages you wish to support, you have to make sure that all the i18n related files are available.
The different translations for this message catalog are being managed separately from the DSpace core project, in order to release updates for these
files more frequently than the DSpace software itself. Visit the dspace-api-lang project on Github.
After rebuilding DSpace, any messages files placed in this directory will be automatically included in the Server web application. Files of the same name
will override any default files. By default, this full directory path may not exist or may be empty. If it does not exist, you can simply create it. You can place
any number of translation catalogues in this directory. To add additional translations, just add another copy of the Messages.properties file translated into
the specific language and country variant you need.
For more information about the [dspace-source]/dspace/modules/ directory, and how it may be used to "overlay" (or customize) the default Server
Webapp, classes and files, please see: Advanced Customisation
Metadata localization
DSpace associates each metadata field value with a language code (though it may be left empty, e.g. for numeric values).
Localization of submission-forms.xml
The display labels for submission-forms.xml are currently not managed in the messages catalogs. To localize this file, you can create versions of this file in
the same folders, appending _COUNTRY at the end of the filename, before the extension. For example, submission-forms_de.xml can be used to
translate the submission form labels in German.
There is a known bug that any translated submission forms (e.g. submission-forms_de.xml) must include all the form-definitions available in the
system. When they are not all included, DSpace will fall back to the default submission forms / locale. See https://github.com/DSpace/DSpace/issues
/2827
Localization of license.default
401
The text in the default submission license (license.default) is currently not managed in the messages catalogs. It is translatable by appending _COUNTRY
at the end of the filename, before the extension like for the localization of the input-forms.xml.
The User Interface translations can be found in the /src/assets/i18n/ folder of your UI's codebase. You can add additional translations & contribute
them back to the project. For details see DSpace 7 Translation - Internationalization (i18n) - Localization (l10n)
All translations of the UI are provided in JSON5 format, which includes support for inline comments.
You can choose which languages you wish to enable/support in your UI by modifying the language section of your config.prod.yml file, which in turn, will
generate a section like this in your environment.prod.ts configuration file:
// Default Language in which the UI will be rendered if the user's browser language is not an active language
defaultLanguage: 'en',
// Languages. DSpace Angular holds a message catalog for each of the following languages.
// When set to active, users will be able to switch to the use of this language in the user interface.
languages: [{
code: 'en',
label: 'English',
active: true,
}, {
code: 'de',
label: 'Deutsch',
active: true,
}, {
code: 'cs',
label: 'eština',
active: true,
}, {
code: 'nl',
label: 'Nederlands',
active: true,
}],
As shown above, the "defaultLanguage" is the language that your UI will use by default , if the user's browser has not specified a preferred language
The array of "languages" are all of the additional languages you wish to support.
The "code" must match the prefix of a *.json5 language file located in your /src/assets/i18n/ folder
The "label" is the text you want to display in the UI language selector (the globe in the header)
The "active" setting allows you to decide whether that language appears in the UI language selector or not.
Any changes to the language settings require rebuilding & redeploying your UI.
402
System Administration
This top level node intends to hold all system administration aspects of DSpace including but not limited to:
Installation
Upgrading
Troubleshooting system errors
Managing Dependencies
In this context System administration is defined as all technical tasks required to get DSpace in a state in which it operates properly so its behaviour is
predictable and can be used according to all the guidelines under "Using DSpace".
Below is the "Command Help Table". This table explains what data is contained in the individual command/help tables in the sections that follow.
Many/most commands and scripts have a simple [dspace]/bin/dspace <command> command. See the Application Layer chapter for the details of
the DSpace Command Launcher, and the Command Line Operations guide for common commands.
403
AIP Backup and Restore
1 Background & Overview
1.1 How does this differ from traditional DSpace Backups? Which Backup route is better?
1.2 How does this help backup your DSpace to remote storage or cloud services (like DuraCloud)?
1.3 AIPs are Archival Information Packages
1.4 AIP Structure / Format
2 Running the Code
2.1 Exporting AIPs
2.1.1 Export Modes & Options
2.1.2 Exporting just a single AIP
2.1.3 Exporting AIP Hierarchy
2.1.3.1 Exporting Entire Site
2.2 Ingesting / Restoring AIPs
2.2.1 Ingestion Modes & Options
2.2.1.1 The difference between "Submit" and "Restore/Replace" modes
2.2.2 Submitting AIP(s) to create a new object
2.2.2.1 Submitting a Single AIP
2.2.2.2 Submitting an AIP Hierarchy
2.2.2.3 Submitting AIP(s) while skipping any Collection Approval Workflows
2.2.3 Restoring/Replacing using AIP(s)
2.2.3.1 Default Restore Mode
2.2.3.2 Restore, Keep Existing Mode
2.2.3.3 Force Replace Mode
2.2.3.4 Restoring Entire Site
2.3 Cleaning up from a failed import
2.4 Performance considerations
2.5 Disable User Interaction for Cron
3 Command Line Reference
3.1 Additional Packager Options
3.1.1 How to use additional options
4 Configuration in 'dspace.cfg'
4.1 AIP Metadata Dissemination Configurations
4.2 AIP Ingestion Metadata Crosswalk Configurations
4.3 AIP Ingestion EPerson Configurations
4.4 AIP Configurations To Improve Ingestion Speed while Validating
5 Common Issues or Error Messages
Configurable Entities are not fully supported by AIP Backup & Restore. Since Entities are Items, their metadata and files can be exported/imported via
AIPs. However, their relationships to other Entities cannot yet be exported (or imported) via AIPs. Therefore, restoring Entities via AIP Backup &
Restore may result in accidental data loss (namely loss of relationships). See https://github.com/DSpace/DSpace/issues/2882 for more information.
AIP Backup & Restore functionality only works with the Latest Version of Items
If you are using the Item Level Versioning functionality (disabled by default), you must be aware that this "Item Level Versioning" feature is not yet
compatible with AIP Backup & Restore. Using them together may result in accidental data loss. Currently the AIPs that DSpace generates only store
the latest version of an Item. Therefore, past versions of Items will always be lost when you perform a restore / replace using AIP tools.
Additional background information available in the Open Repositories 2010 Presentation entitled Improving DSpace Backups, Restores & Migrations
DSpace can backup and restore all of its contents as a set of AIP Files. This includes all Communities, Collections, Items, Groups and People in the
system.
This feature came out of a requirement for DSpace to better integrate with DuraCloud, and other backup storage systems. One of these requirements is to
be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.
Essentially, this means DSpace can export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a
relatively standard format (a METS-based, AIP format). This entire hierarchy can also be re-imported into DSpace in the same format (essentially a restore
of that content in the same or different DSpace installation).
Allows one to more easily move entire Communities or Collections between DSpace instances.
Allows for a potentially more consistent backup of this hierarchy (e.g. to DuraCloud, or just to your own local backup system), rather than relying
on synchronizing a backup of your Database (stores metadata/relationships) and assetstore (stores files/bitstreams).
Provides a way for people to more easily get their data out of DSpace (whatever the purpose may be).
Provides a relatively standard format for people to migrate entire hierarchies (Communities/Collections) from one DSpace to another (or from
another system into DSpace).
How does this differ from traditional DSpace Backups? Which Backup route is better?
Traditionally, it has always been recommended to backup and restore DSpace's database and files (also known as the "assetstore") separately. This is
described in more detail in the Storage Layer section of the DSpace System Documentation. The traditional backup and restore route is still a
recommended and supported option.
404
However, the new AIP Backup & Restore option seeks to try and resolve many of the complexities of a traditional backup and restore. The below table
details some of the differences between these two valid Backup and Restore options.
Supported Backup
/Restore Types
Can Backup & Restore all Yes (Requires two backups/restores – Yes (Though, will not backup/restore items which are not officially "in archive")
DSpace Content easily one for Database and one for Files)
Can Backup & Restore a No (It is possible, but requires a strong Yes
Single Community understanding of DSpace database
/Collection/Item easily structure & folder organization in order
to only backup & restore metadata/files
belonging to that single object)
Can Backup & Restore Item Yes (Requires two backups/restores – No (Currently, AIP Backup & Restore is not fully compatible with Item Level
Versions one for Database and one for Files) Versioning. AIP Backup & Restore can only backup/restore the latest version
of an Item)
Can Backup & Restore Conf Yes (Requires two backups/restores – No (Currently, AIP Backup & Restore is not fully compatible with Configurable
igurable Entities one for Database and one for Files) Entities. AIP Backup & Restore can only backup/restore the metadata & files
of the Entity, but cannot backup/restore relationships to other Entities)
Supported Object
Types During Backup &
Restore
Supports backup/restore of Yes No (This is a known issue. All previously harvested Items will be restored, but
all Collection Harvesting the OAI-PMH/OAI-ORE harvesting settings will be lost during the restore
settings (only for process.)
Collections which pull in all
Items via OAI-PMH or OAI-
ORE)
Supports backup/restore of Yes Yes (During restore, the AIP Ingester may throw a false "Could not find a
Item Mappings between parent DSpaceObject" error (see Common Issues or Error Messages), if it
Collections tries to restore an Item Mapping to a Collection that it hasn't yet restored. But
this error can be safely bypassed using the 'skipIfParentMissing' flag (see Add
itional Packager Options for more details).
Supports backup/restore of Yes No (AIPs are only generated for objects which are completed and considered
all in-process, uncompleted "in archive")
Submissions (or those
currently in an approval
workflow)
Supports backup/restore of Yes Yes (Custom Metadata Fields will be automatically recreated. Custom
Items using custom Metadata Schemas must be manually created first, in order for DSpace to be
Metadata Schemas & Fields able to recreate custom fields belonging to that schema. See Common Issues
or Error Messages for more details.)
405
Supports backup/restore of Yes (if you backup your entire DSpace Not by default (unless you also backup parts of your DSpace directory – note,
all local DSpace directory as part of backing up your files) you wouldn't need to backup the '[dspace]/assetstore' folder again, as those
Configurations and files are already included in AIPs)
Customizations
Based on your local institutions needs, you will want to choose the backup & restore process which is most appropriate to you. You may also find it
beneficial to use both types of backups on different time schedules, in order to keep to a minimum the likelihood of losing your DSpace installation settings
or its contents. For example, you may choose to perform a Traditional Backup once per week (to backup your local system configurations and
customizations) and an AIP Backup on a daily basis. Alternatively, you may choose to perform daily Traditional Backups and only use the AIP Backup as a
"permanent archives" option (perhaps performed on a weekly or monthly basis).
If you choose to use the AIP Backup and Restore option, do not forget to also backup your local DSpace configurations and customizations. Depending on
how you manage your own local DSpace, these configurations and customizations are likely in one or more of the following locations:
[dspace] - The DSpace installation directory (Please note, if you also use the AIP Backup & Restore option, you do not need to backup your [d
space]/assetstore directory, as those files already exist in your AIPs).
[dspace-source] - The DSpace source directory
How does this help backup your DSpace to remote storage or cloud services (like DuraCloud)?
While AIP Backup and Restore is primarily a way to export your DSpace content objects to a local filesystem (or mounted drive), it can also be used as the
basis for ensuring your content is safely backed up in a remote location (e.g. DuraCloud or other cloud backup services).
Simply put, these AIPs can be generated and then replicated off to remote storage or a cloud backup service for safe keeping. You can then pull them
down either as an entire set, or individually, in order to restore one or more objects into your DSpace instance. While you could simply backup your entire
DSpace database and "assetstore" to a cloud service, you'd have to download the entire database backup again in order to restore any content. With
AIPs, you can instead just download the individual AIP files you need (which can decrease your I/O costs, if any exist) for that restoration.
This upload/download of your AIPs to a backup location can be managed in a manual fashion (e.g. via your own custom code or shell scripts), or you can
use a DSpace Replication Task Suite add-on to help ease this process
The Replication Task Suite add-on for DSpace allows you the ability to backup and restore DSpace contents to
/from AIPs via the DSpace Administrative Web Interface. It also includes "connectors" to the DuraCloud API,
so you can configure it to automatically backup/retrieve your AIPs to/from DuraCloud. Installing this add-on
means you can now easily backup and restore DSpace to DuraCloud (or other systems) simply via the
DSpace Administrative Web Interface. More information on installing and configuring this add-on can be
found on the Replication Task Suite page.
406
For more specific details of AIP format / structure, along with examples, please see DSpace AIP Format.
Exporting AIPs
Single AIP (default, using -d option) - Exports just an AIP describing a single DSpace object. So, if you ran it in this default mode for a Collection,
you'd just end up with a single Collection AIP (which would not include AIPs for all its child Items)
Hierarchy of AIPs (using the -d --all or -d -aoption) - Exports the requested AIP describing an object, plus the AIP for all child objects.
Some examples follow:
For a Site - this would export all Communities, Collections & Items within the site into AIP files (in a provided directory)
For a Community - this would export that Community and all SubCommunities, Collections and Items into AIP files (in a provided
directory)
For a Collection - this would export that Collection and all contained Items into AIP files (in a provided directory)
For an Item – this just exports the Item into an AIP as normal (as it already contains its Bitstreams/Bundles by default)
for example:
The above code will export the object of the given handle (4321/4567) into an AIP file named "aip4567.zip". This will not include any child objects for
Communities or Collections.
for example:
The above code will export the object of the given handle (4321/4567) into an AIP file named "aip4567.zip". In addition it would export all children objects
to the same directory as the "aip4567.zip" file. The child AIP files are all named using the following format:
AIPs are only generated for objects which are currently in the "in archive" state in DSpace. This means that in-progress, uncompleted submissions are not
described in AIPs and cannot be restored after a disaster.
407
[dspace]/bin/dspace packager -d -a -t AIP -e [email protected] -i 4321/0 sitewide-aip.zip
Again, this would export the DSpace Site AIP into the file "sitewide-aip.zip", and export AIPs for all Communities, Collections and Items into the same
directory as the Site AIP.
1. Submit/Ingest Mode (-s option, default) – submit AIP(s) to DSpace in order to create a new object(s) (i.e. AIP is treated like a SIP – Submission
Information Package)
2. Restore Mode (-r option) – restore pre-existing object(s) in DSpace based on AIP(s). This also attempts to restore all handles and relationships
(parent/child objects). This is a specialized type of "submit", where the object is created with a known Handle, known UUID and known
relationships.
3. Replace Mode (-r -f option) – replace existing object(s) in DSpace based on AIP(s). This also attempts to restore all handles and relationships
(parent/child objects). This is a specialized type of "restore" where the contents of existing object(s) is replaced by the contents in the AIP(s). By
default, if a normal "restore" finds the object already exists, it will back out (i.e. rollback all changes) and report which object already exists.
Again, like export, there are two types of AIP Ingestion you can perform (using any of the above modes):
Single AIP (default) - Ingests just an AIP describing a single DSpace object. So, if you ran it in this default mode for a Collection AIP, you'd just
create a DSpace Collection from the AIP (but not ingest any of its child objects)
Hierarchy of AIPs (by including the --all or -aoption after the mode) - Ingests the requested AIP describing an object, plus the AIP for all child
objects. Some examples follow:
For a Site - this would ingest all Communities, Collections & Items based on the located AIP files
For a Community - this would ingest that Community and all SubCommunities, Collections and Items based on the located AIP files
For a Collection - this would ingest that Collection and all contained Items based on the located AIP files
For an Item – this just ingest the Item (including all Bitstreams & Bundles) based on the AIP file.
Submission Mode (-s mode) - creates a new object (AIP is treated like a SIP)
By default, a new Handle is always assigned
However, you can force it to use the handle specified in the AIP by specifying -o ignoreHandle=false as one of your
parameters
By default, a new Parent object must be specified (using the -p parameter). This is the location where the new object will be created.
However, you can force it to use the parent object specified in the AIP by specifying -o ignoreParent=false as one of your
parameters
By default, will respect a Collection's Workflow process when you submit an Item to a Collection
However, you can specifically skip any workflow approval processes by specifying -w parameter.
Always adds a new Deposit License to Items
Always adds new DSpace System metadata to Items (includes new "dc.date.accessioned", "dc.date.available", "dc.date.issued" and "dc.
description.provenance" entries)
WARNING: Submission mode may not be able to maintain Item Mappings between Collections. Because these mappings are recorded
via the Collection Handles, mappings may be restored improperly if the Collection handle has changed when moving content from one
DSpace instance to another.
Restore / Replace Mode (-r mode) - restores a previously existing object (as if from a backup)
By default, the Handle specified in the AIP is restored
However, for restores, you can force a new handle to be generated by specifying -o ignoreHandle=true as one of your
parameters. (NOTE: Doesn't work for replace mode as the new object always retains the handle of the replaced object)
Restore/Replace restores Handles as well as UUIDs. (NOTE: UUID restoration only possible in 7.1 or above)
By default, the object is restored under the Parent specified in the AIP
However, for restores, you can force it to restore under a different parent object by using the -p parameter. (NOTE: Doesn't
work for replace mode, as the new object always retains the parent of the replaced object)
Always skips any Collection workflow approval processes when restoring/replacing an Item in a Collection
Never adds a new Deposit License to Items (rather it restores the previous deposit license, as long as it is stored in the AIP)
Never adds new DSpace System metadata to Items (rather it just restores the metadata as specified in the AIP)
It is possible to change some of the default behaviors of both the Submission and Restore/Replace Modes. Please see the Additional Packager Options
section below for a listing of command-line options that allow you to override some of the default settings described above.
408
Submitting a Single AIP
AIPs treated as SIPs
This option allows you to essentially use an AIP as a SIP (Submission Information Package). The default settings will create a new DSpace object (with a
new handle and a new parent object, if specified) from your AIP.
To ingest a single AIP and create a new DSpace object under a parent of your choice, specify the -p (or --parent) package parameter to the command.
Also, note that you are running the packager in -s (submit) mode.
NOTE: This only ingests the single AIP specified. It does not ingest all children objects.
If you leave out the -p parameter, the AIP package ingester will attempt to install the AIP under the same parent it had before. As you are also specifying
the -s (submit) parameter, the packager will assume you want a new Handle to be assigned (as you are effectively specifying that you are submitting a n
ew object). If you want the object to retain the Handle specified in the AIP, you can specify the -o ignoreHandle=false option to force the packager to
not ignore the Handle specified in the AIP.
This option allows you to essentially use a set of AIPs as SIPs (Submission Information Packages). The default settings will create a new DSpace object
(with a new handle and a new parent object, if specified) from each AIP
To ingest an AIP hierarchy from a directory of AIPs, use the -a (or --all) package parameter.
for example:
The above command will ingest the package named "aip4567.zip" as a child of the specified Parent Object (handle="4321/12"). The resulting object is
assigned a new Handle (since -s is specified). In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested (a new Handle is also
assigned for each child AIP).
Another example – Ingesting a Top-Level Community (by using the Site Handle, <site-handle-prefix>/0):
The above command will ingest the package named "community-aip.zip" as a top-level community (i.e. the specified parent is "4321/0" which is a Site
Handle). Again, the resulting object is assigned a new Handle. In addition, any child AIPs referenced by "community-aip.zip" are also recursively ingested
(a new Handle is also assigned for each child AIP).
Please note: If you are submitting a larger amount of content (e.g. multiple Communities/Collections) to your DSpace, you may want to tell the 'packager'
command to skip over any existing Collection approval workflows by using the -w flag. By default, all Collection approval workflows will be respected. This
means if the content you are submitting includes a Collection with an enabled workflow, you may see the following occur:
Therefore, if this content has already received some level of approval, you may want to submit it using the -w flag, which will skip any workflow
approval processes. For more information, see Submitting AIP(s) while skipping any Collection Approval Workflows.
409
Item Mappings may not be maintained when submitting an AIP hierachy
When an Item is mapped to one or more Collections, this mapping is recorded in the AIP using the mapped Collection's handle. Unfortunately, since the
submission mode (-s) assigns new handles to all objects in the hierarchy, this may mean that the mapped Collection's handle will have changed (or even
that a different Collection will be available at the original mapped Collection's handle). DSpace does not have a way to uniquely identify Collections other
than by handle, which means that item mappings are only able to be retained when the Collection handle is also retained.
1. Use the restore/replace mode (-r) instead, as it will retain existing Collection Handles. Unfortunately though, this may not work if the content is
being moved from a Test DSpace to a Production DSpace, as these existing handles may not be valid.
2. OR, use the submission mode with the "--o ignoreHandle=false". This will also retain existing Collection Handles. Unfortunately though, this may
not work if the content is being moved from a Test DSpace to a Production DSpace, as these existing handles may not be valid.
3. OR, remove all existing Item Mappings and re-export AIPs (without Item Mappings). Then, import the hierarchy into the new DSpace instance
(again without Item Mappings). Finally, recreate the necessary Item Mappings using a different tool, e.g. the Batch Metadata Editing tool supports
bulk editing of Collection memberships/mappings.
Missing Groups or EPeople cannot be created when submitting an individual Community or Collection AIP
Please note, if you are using AIPs to move an entire Community or Collection from one DSpace to another, there is a known issue (see https://github.com
/DSpace/DSpace/issues/4477) that the new DSpace instance will be unable to (re-)create any DSpace Groups or EPeople which are referenced by a
Community or Collection AIP. The reason is that the Community or Collection AIP itself doesn't contain enough information to create those Groups or
EPeople (rather that info is stored in the SITE AIP, for usage during Full Site Restores).
However, there are two possible ways to get around this known issue:
EITHER, you can manually recreate all referenced Groups/EPeople in the new DSpace that you are submitting the Community or Collection AIP
into.
OR, you can temporarily disable the import of Group/EPeople information when submitting the Community or Collection AIP to the new DSpace.
This would mean that after you submit the AIP to the new DSpace, you'd have to manually go in and add in any special permissions (as needed).
To disable the import of Group/EPeople information, add these settings to your dspace.cfgfile, and re-run the submission of the AIP with these
settings in place:
mets.dspaceAIP.ingest.crosswalk.METSRIGHTS = NIL
mets.dspaceAIP.ingest.crosswalk.DSPACE-ROLES = NIL
Don't forget to remove these settings after you import your Community or Collection AIP. Leaving them in place will mean that every time
you import an AIP, all of its Group/EPeople/Permissions would be ignored.
However, if you'd like to skip all workflow approval processes you can use the -w flag to do so. For example, the following command will skip any
Collection approval workflows and immediately add the Item to a Collection.
This -w flag may also be used when Submitting an AIP Hierarchy. For example, if you are migrating one or more Collections/Communities from one
DSpace to another, you may choose to submit those AIPs with the -w option enabled. This will ensure that, if a Collection has a workflow approval process
enabled, all its Items are available immediately rather than being all placed into the workflow approval process.
1. Default Restore Mode (-r) = Attempt to restore object (and optionally children). Rollback all changes if any object is found to already exist.
2. Restore, Keep Existing Mode (-r -k) = Attempt to restore object (and optionally children). If an object is found to already exist, skip over it (and
all children objects), and continue to restore all other non-existing objects.
3. Force Replace Mode (-r -f) = Restore an object (and optionally children) and overwrite any existing objects in DSpace. Therefore, if an object
is found to already exist in DSpace, its contents are replaced by the contents of the AIP. WARNING: This mode is potentially dangerous as it will
permanently destroy any object contents that do not currently exist in the AIP. You may want to perform a secondary backup, unless you are sure
you know what you are doing!
410
Restore a Single AIP: Use this 'packager' command template to restore a single object from an AIP (not including any child objects):
Restore a Hierarchy of AIPs: Use this 'packager' command template to restore an object from an AIP along with all child objects (from their AIPs):
For example:
Notice that unlike -s option (for submission/ingesting), the -r option does not require the Parent Object (-p option) to be specified if it can be determined
from the package itself.
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a
child of the parent object specified within the package itself). In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested (the -a
option specifies to also restore all child AIPs). They are also restored with the Handles & Parent Objects provided with their package. If any object is found
to already exist, all changes are rolled back (i.e. nothing is restored to DSpace)
In some cases, when you restore a large amount of content to your DSpace, the internal database counts (called "sequences") may get out of sync with
the Handles of the content you just restored. As a best practice, it is highly recommended to always re-run the "update-sequences" script on your
DSpace database after a larger scale restore. This database script should be run while DSpace is stopped (you may either stop Tomcat or just the DSpace
webapps). PostgreSQL/Oracle must be running. Simply run:
[dspace]/bin/dspace database update-sequences
More Information on using Default Restore Mode with Community/Collection AIPs
Using the Default Restore Mode without the -a option, will only restore the metadata for that specific Community or Collection. No child objects
will be restored.
Using the Default Restore Mode with the -a option, will only successfully restore a Community or Collection if that object along with any child
objects (Sub-Communities, Collections or Items) do not already exist. In other words, if any objects belonging to that Community or Collection
already exist in DSpace, the Default Restore Mode will report an error that those object(s) could not be recreated. If you encounter this situation,
you will need to perform the restore using either the Restore, Keep Existing Mode or the Force Replace Mode (depending on whether you want to
keep or replace those existing child objects).
One special case to note: If a Collection or Community is found to already exist, its child objects are also skipped over. So, this mode will not auto-restore
items to an existing Collection.
Restore a Hierarchy of AIPs: Use this 'packager' command template to restore an object from an AIP along with all child objects (from their AIPs):
For example:
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a
child of the parent object specified within the package itself). In addition, any child AIPs referenced by "aip4567.zip" are also recursively restored (the -a
option specifies to also restore all child AIPs). They are also restored with the Handles & Parent Objects provided with their package. If any object is found
to already exist, it is skipped over (child objects are also skipped). All non-existing objects are restored.
411
May also be useful in some specific restoration scenarios
This mode may also be used to restore missing objects which refer to existing objects. For example, if you are restoring a missing Collection which had
existing Items linked to it, you can use this mode to auto-restore the Collection and update those existing Items so that they again link back to the newly
restored Collection.
Potential for Data Loss
Because this mode actually destroys existing content in DSpace, it is potentially dangerous and may result in data loss! You may wish to perform a
secondary full backup (assetstore files & database) before attempting to replace any existing object(s) in DSpace.
Replace using a Single AIP: Use this 'packager' command template to replace a single object from an AIP (not including any child objects):
Replace using a Hierarchy of AIPs: Use this 'packager' command template to replace an object from an AIP along with all child objects (from their AIPs):
For example:
In the above example, the package "aip4567.zip" is restored to the DSpace installation with the Handle provided within the package itself (and added as a
child of the parent object specified within the package itself). In addition, any child AIPs referenced by "aip4567.zip" are also recursively ingested. They are
also restored with the Handles & Parent Objects provided with their package. If any object is found to already exist, its contents are replaced by the
contents of the appropriate AIP.
If any error occurs, the script attempts to rollback the entire replacement process.
1. Install a completely "fresh" version of DSpace by following the Installation instructions in the DSpace Manual
At this point, you should have a completely empty, but fully-functional DSpace installation. You will need to create an initial Administrator
user in order to perform this restore (as a full-restore can only be performed by a DSpace Administrator).
2. Once DSpace is installed, run the following command to restore all its contents from AIPs
a. While the "-o skipIfParentMissing=true" flag is optional, it is often necessary whenever you are performing a large hierarchical
site restoration. Please see the Additional Packager Options section below.
Notice that you are running this command in "Force Replace" mode (-r -f). This is necessary as your empty DSpace install will already include
a few default groups (Administrators and Anonymous) and your initial administrative user. You need to replace these groups in order to restore
your prior DSpace contents completely.
<eperson> should be replaced with the Email Address of the initial Administrator (who you created when you reinstalled DSpace).
<site-handle-prefix> should be replaced with your DSpace site's assigned Handle Prefix. This is equivalent to the handle.prefix setting
in your dspace.cfg
/full/path/to/your/site-aip.zip is the full path to the AIP file which represents your DSpace SITE. This file will be named whatever you
named it when you actually exported your entire site. All other AIPs are assumed to be referenced from this SITE AIP (in most cases, they should
be in the same directory as that SITE AIP).
In some cases, when you restore a large amount of content to your DSpace, the internal database counts (called "sequences") may get out of sync with
the Handles of the content you just restored. As a best practice, it is highly recommended to always re-run the "update-sequences" script on your
DSpace database after a larger scale restore. This database script should be run while DSpace is stopped (you may either stop Tomcat or just the DSpace
webapps). PostgreSQL/Oracle must be running. Simply run:
[dspace]/bin/dspace database update-sequences
412
Sometimes your packager import of AIP packages can fail, due to lack of memory (see below for advice on better performance, please use JAVA_OPTS to
set your memory higher than the default). If that happens, DSpace by design will leave the bitstreams it did import sucessfully, but they will be oprphaned,
and will just occupy space in your assetstore. The standard DSpace cleanup cron job will clean up these orphaned bitstreams, however, you can also
clean them up manually by running the following command:
[dspace]/bin/dspace cleanup -v
Performance considerations
When importing large structures like the whole site or a large collection/community, keep in mind that this can require a lot of memory, more than the
default amount of heap allocated to the command-line launcher (256 Mb: JAVA_OPTS="-Xmx256m -Dfile.encoding=UTF-8"). This memory must be
allocated in addition to the normal amount of memory allocated to Tomcat. For example, a site of 2500 fulltext items (2 Gb altogether) requires 5 Gb of
maximum heap space and takes around 1 hour, including import and indexing.
You can raise the limit for a single run of the packager command by specifying memory options in the JAVA_OPTS environment variable, e.g.:
If the importer runs out of heap memory, it will crash either with "java.lang.OutOfMemoryError: GC overhead limit exceeded", which can be suppressed by
adding "-XX:-UseGCOverheadLimit" to JAVA_OPTS, or with "java.lang.OutOfMemoryError: Java heap space". You can increase the allocated heap
memory and try again, but keep in mind that although no changes were made in the database, the unsuccessfully imported files are still left in the
assetstore (see https://github.com/DSpace/DSpace/issues/5593).
# Perform a full site backup to AIPs(with user interaction disabled) every Sunday at 1:00AM
# NOTE: Make sure to replace "123456789" with your actual Handle Prefix, and "[email protected]" with your
Administrator account email.
0 1 * * * [dspace]/bin/dspace packager -u -d -a -t AIP -e [email protected] -i 123456789/0 [full-path-to-backup-
folder]/sitewide-aip.zip
-a (--all) both For Ingest: recursively ingest all child AIPs (referenced from this AIP).
ingest
and For Export: recursively export all child objects (referenced from this parent object)
export
-d (-- export- This flag simply triggers the export of AIPs from the system. See Exporting AIPs
disseminat only
e)
-e (– ingest- The email address of the EPerson who is ingesting the AIPs. Oftentimes this should be an Administrative account.
eperson) only
[email-
address]
-f (-- ingest- Ingest the AIPs in "Force Replace Mode" (must be specified in conjunction with -r flag), where existing objects will be
force- only replaced by the contents of the AIP.
replace)
413
-h (-- both Return help information. You should specify with -t for additional package specific help information
help) ingest
and
export
-i (-- both For Ingest: Only valid in "Force Replace Mode". In that mode this is the identifier of the object to replace.
identifier ingest
) [handle] and For Export: The identifier of the object to export to an AIP
export
-k (-- ingest- Specifies to use "Restore, Keep Existing Mode" during ingest (must be specified in conjunction with -r flag). In this mode,
keep- only existing objects in DSpace will NOT be replaced by their AIPs, but missing objects will be restored from AIPs.
existing)
-o (-- both This flag is used to pass Additional Packager Options to the Packager command. Each type of packager may define its
option) ingest own custom Additional Options. For AIPs, the valid options are documented in the Additional Packager Options section
[setting]= and below. This is repeatable (e.g. -o [setting1]=[value] -o [setting2]=value)
[value] export
-p (-- ingest Handle(s) of the parent Community or Collection to into which an AIP should be ingested. This may be repeatable.
parent) only
[handle]
-r (-- ingest Specifies that this ingest is either "Restore Mode" (when standalone), "Restore, Keep Existing Mode" (when used with -k
restore) only flag) or "Force Replace Mode" (when used with -f flag)
-s (-- ingest Specifies that this ingest is in "Submit Mode" where an AIP is treated as a new object and assigned a new Handle
submit) only /Identifier, etc.
-t (-- both Specifies the type of package which is being ingested or exported. This controls which Ingester or Disseminator class is
type) ingest called. For AIPs, this is always set to "-t AIP"
[package- and
type] export
-u (--no- both Skips over all user interaction (e.g. question prompts). This flag can be used when running the packager from a script or
user- ingest cron job to bypass all user interaction. See also Disable User Interaction for Cron
interactio and
n) export
createM ingest- true Tells the AIP ingester to automatically create any metadata fields which are found to be missing from the DSpace Metadata
etadata only Registry. When 'true', this means as each AIP is ingested, new fields may be added to the DSpace Metadata Registry if they don't
Fields= already exist. When 'false', an AIP ingest will fail if it encounters a metadata field that doesn't exist in the DSpace Metadata
[value] Registry. (NOTE: This will not create missing DSpace Metadata Schemas. If a schema is found to be missing, the ingest will
always fail.)
filterB export- defaults to This option can be used to limit the Bundles which are exported to AIPs for each DSpace Item. By default, all file Bundles will be
undles= only exporting exported into Item AIPs. You could use this option to limit the size of AIPs by only exporting certain Bundles. WARNING: any
[value] all Bundles bundles not included in AIPs will obviously be unable to be restored. This option can be run in two ways:
Exclude Bundles: By default, you can provide a comma-separated list of bundles to be excluded from AIPs (e.g. "TEXT,
THUMBNAIL")
Include Bundles: If you prepend the list with the "+" symbol, then the list specifies the bundles to be included in AIPs (e.g.
"+ORIGINAL,LICENSE" would only include those two bundles). This second option is identical to using "includeBundles"
option described below.
(NOTE: If you choose to no longer export LICENSE or CC_LICENSE bundles, you will also need to disable the License
Dissemination Crosswalks in the aip.disseminate.rightsMD configuration for the changes to take affect)
ignoreH ingest- Restore If 'true', the AIP ingester will ignore any Handle specified in the AIP itself, and instead create a new Handle during the ingest
andle= only /Replace process (this is the default when running in Submit mode, using the -s flag). If 'false', the AIP ingester attempts to restore the
[value] Mode Handles specified in the AIP (this is the default when running in Restore/replace mode, using the -r flag).
defaults to
'false',
Submit
Mode
defaults to
'true'
414
ignoreP ingest- Restore If 'true', the AIP ingester will ignore any Parent object specified in the AIP itself, and instead ingest under a new Parent object (this
arent= only /Replace is the default when running in Submit mode, using the -s flag). The new Parent object must be specified via the -p flag (run dspac
[value] Mode e packager -h for more help). If 'false', the AIP ingester attempts to restore the object directly under its old Parent (this is the
defaults to default when running in Restore/replace mode, using the -r flag).
'false',
Submit
Mode
defaults to
'true'
include export- defaults to This option can be used to limit the Bundles which are exported to AIPs for each DSpace Item. By default, all file Bundles will be
Bundles only "all" exported into Item AIPs. You could use this option to limit the size of AIPs by only exporting certain Bundles. WARNING: any
= bundles not included in AIPs will obviously be unable to be restored. This option expects a comma separated list of bundle names
[value] (e.g. "ORIGINAL,LICENSE,CC_LICENSE,METADATA"), or "all" if all bundles should be included.
(See "filterBundles" option above if you wish to exclude particular Bundles. However, this "includeBundles" option cannot be used
at the same time as "filterBundles".)
(NOTE: If you choose to no longer export LICENSE or CC_LICENSE bundles, you will also need to disable the License
Dissemination Crosswalks in the aip.disseminate.rightsMD configuration for the changes to take affect)
manifes both false If 'true', the AIP Disseminator will only import/export a METS Manifest XML file (i.e. result will be an unzipped 'mets.xml' file),
tOnly= ingest instead of a full AIP. This METS Manifest contains URI references to all content files, but does not contain any content files. This
[value] and option is experimental and is meant for debugging purposes only. It should never be set to 'true' if you want to be able to
export restore content files. Again, please note that when you use this option, the final result will be an XML file, NOT the normal ZIP-
based AIP format.
passwor export- false If 'true' (and the 'DSPACE-ROLES' crosswalk is enabled, see #AIP Metadata Dissemination Configurations), then the AIP
ds= only Disseminator will export user password hashes (i.e. encrypted passwords) into Site AIP's METS Manifest. This would allow you to
[value] restore user's passwords from Site AIP. If 'false', then user password hashes are not stored in Site AIP, and passwords cannot be
restored at a later time.
skipIfP ingest- false If 'true', ingestion will skip over any "Could not find a parent DSpaceObject" errors that are encountered during the ingestion
arentMi only process (Note: those errors will still be logged as "warning" messages in your DSpace log file). If you are performing a full site
ssing= restore (or a restore of a larger Community/Collection hierarchy), you may encounter these errors if you have a larger number of
[value] Item mappings between Collections (i.e. Items which are mapped into several collections at once). When you are performing a
recursive ingest, skipping these errors should not cause any problems. Once the missing parent object is ingested it will
automatically restore the Item mapping that caused the error. For more information on this "Could not find a parent DSpaceObject"
error see Common Issues or Error Messages.
unautho export- unspecified If 'skip', the AIP Disseminator will skip over any unauthorized Bundle or Bitstream encountered (i.e. it will not be added to the AIP).
rized= only If 'zero', the AIP Disseminator will add a Zero-length "placeholder" file to the AIP when it encounters an unauthorized Bitstream. If
[value] unspecified (the default value), the AIP Disseminator will throw an error if an unauthorized Bundle or Bitstream is encountered.
updated export- unspecified This option works as a basic form of "incremental backup". This option requires that an ISO-8601 date is specified. When specified,
After= only the AIP Disseminator will only export Item AIPs which have a last-modified date after the specified ISO-8601 date. This option has
[value] no affect on the export of Site, Community or Collection AIPs as DSpace does not record a last-modified date for Sites,
Communities or Collections. For example, when this option is specified during a full-site export, the AIP Disseminator will export
the Site AIP, all Community AIPs, all Collection AIPs, and only Item AIPs modified after that date and time.
validate both Export If 'true', every METS file in AIP will be validated before ingesting or exporting. By default, DSpace will validate everything on export,
= ingest defaults to but will skip validation during import. Validation on export will ensure that all exported AIPs properly conform to the METS profile
[value] and 'true', (and will throw errors if any do not). Validation on import will ensure every METS file in every AIP is first validated before importing
export Ingest into DSpace (this will cause the ingestion processing to take longer, but tips on speeding it up can be found in the "AIP
defaults to Configurations To Improve Ingestion Speed while Validating" section below). DSpace recommends minimally validating AIPs on
'false' export. Ideally, you should validate both on export and import, but import validation is disabled by default in order to increase the
speed of AIP restores.
From the command-line, you can add the option to your command by using the -o or --option parameter.
For example:
If you are programmatically calling the org.dspace.content.packager.DSpaceAIPIngester from your own custom script, you can specify these
options via the org.dspace.content.packager.PackageParameters class.
As a basic example:
415
PackageParameters params = new PackageParameters;
params.addProperty("createMetadataFields", "false");
params.addProperty("ignoreParent", "true");
Configuration in 'dspace.cfg'
The following new configurations relate to AIPs:
It is recommended to minimally use the default settings when generating AIPs. DSpace can only restore information that is included within an AIP.
Therefore, if you choose to no longer include some information in an AIP, DSpace will no longer be able to restore that information from an AIP backup
aip.disseminate.techMD - Lists the DSpace Crosswalks (by name) which should be called to populate the <techMD> section of the METS
file within the AIP (Default: PREMIS, DSPACE-ROLES)
The PREMIS crosswalk generates PREMIS metadata for the object specified by the AIP
The DSPACE-ROLES crosswalk exports DSpace Group / EPerson information into AIPs in a DSpace-specific XML format. Using this
crosswalk means that AIPs can be used to recreated Groups & People within the system. (NOTE: The DSPACE-ROLES crosswalk should
be used alongside the METSRights crosswalk if you also wish to restore the permissions that Groups/People have within the System.
See below for more info on the METSRights crosswalk.)
aip.disseminate.sourceMD - Lists the DSpace Crosswalks (by name) which should be called to populate the <sourceMD> section of the
METS file within the AIP (Default: AIP-TECHMD)
The AIP-TECHMD Crosswalk generates technical metadata (in DIM format) for the object specified by the AIP
aip.disseminate.digiprovMD - Lists the DSpace Crosswalks (by name) which should be called to populate the <digiprovMD> section of
the METS file within the AIP (Default: None)
aip.disseminate.rightsMD - Lists the DSpace Crosswalks (by name) which should be called to populate the <rightsMD> section of the
METS file within the AIP (Default: DSpaceDepositLicense:DSPACE_DEPLICENSE, CreativeCommonsRDF:DSPACE_CCRDF,
CreativeCommonsText:DSPACE_CCTEXT, METSRights)
The DSPACE_DEPLICENSE crosswalk ensures the DSpace Deposit License is referenced/stored in AIP
The DSPACE_CCRDF crosswalk ensures any Creative Commons RDF Licenses are reference/stored in AIP
The DSPACE_CCTEXT crosswalk ensures any Creative Commons Textual Licenses are referenced/stored in AIP
The METSRights crosswalk ensures that Permissions/Rights on DSpace Objects (Communities, Collections, Items or Bitstreams) are
referenced/stored in AIP. Using this crosswalk means that AIPs can be used to restore permissions that a particular Group or Person
had on a DSpace Object. (NOTE: The METSRights crosswalk should always be used in conjunction with the DSPACE-ROLES crosswalk
(see above) or a similar crosswalk. The METSRights crosswalk can only restore permissions, and cannot re-create Groups or EPeople
in the system. The DSPACE-ROLES can actually re-create the Groups or EPeople as needed.)
aip.disseminate.dmd - Lists the DSpace Crosswalks (by name) which should be called to populate the <dmdSec>section of the METS file
within the AIP (Default: MODS, DIM)
The MODS crosswalk translates the DSpace descriptive metadata (for this object) into MODS. As MODS is a relatively "standard"
metadata schema, it may be useful to include a copy of MODS metadata in your AIPs if you should ever want to import them into
another (non-DSpace) system.
The DIM crosswalk just translates the DSpace internal descriptive metadata into an XML format. This XML format is proprietary to
DSpace, but stores the metadata in a format similar to Qualified Dublin Core.
mets.dspaceAIP.ingest.crosswalk.<mdType> = <DSpace-crosswalk-name>
<mdType> is the type of metadata as specified in the METS file. This corresponds to the value of the @MDTYPE attribute (of that
metadata section in the METS). When the @MDTYPE attribute is "OTHER", then the <mdType> corresponds to the @OTHERMDTYPE
attribute value.
<DSpace-crosswalk-name> specifies the name of the DSpace Crosswalk which should be used to ingest this metadata into DSpace.
You can specify the "NULLSTREAM" crosswalk if you specifically want this metadata to be ignored (and skipped over during ingestion).
416
mets.dspaceAIP.ingest.crosswalk.DSpaceDepositLicense = NULLSTREAM
mets.dspaceAIP.ingest.crosswalk.CreativeCommonsRDF = NULLSTREAM
mets.dspaceAIP.ingest.crosswalk.CreativeCommonsText = NULLSTREAM
The above settings tell the ingester to ignore any metadata sections which reference DSpace Deposit Licenses or Creative Commons Licenses. These
metadata sections can be safely ignored as long as the "LICENSE" and "CC_LICENSE" bundles are included in AIPs (which is the default setting). As the
Licenses are included in those Bundles, they will already be restored when restoring the bundle contents.
If unspecified in the above settings, the AIP ingester will automatically use the Crosswalk which is named the same as the @MDTYPE or
@OTHERMDTYPE attribute for the metadata section. For example, a metadata section with an @MDTYPE="PREMIS" will be processed by the DSpace
Crosswalk named "PREMIS".
mets.dspaceAIP.ingest.createSubmitter = false
In order to perform validations in a speedy fashion, you can pull down a local copy of all schemas. Validation will then use this local cache, which can
sometimes increase the speed up to 10 x.
To use a local cache of XML schemas when validating, use the following settings in 'dspace.cfg'. The general format is:
The default settings are all commented out. But, they provide a full listing of all schemas currently used during validation of AIPs. In order to utilize them,
uncomment the settings, download the appropriate schema file, and save it to your [dspace]/config/schemas/ directory (by default this directory
does not exist – you will need to create it) using the specified file name:
417
Ingest If you receive this problem, you are likely attempting to Restore an Entire Site, but are not running the command in Force Replace Mode
/Restore (-r -f). Please see the section on Restoring an Entire Site for more details on the flags you should be using.
Error:
"Group
Administrator
already
exists"
Ingest If you receive this problem, one or more of your Items is using a custom metadata schema which DSpace is currently not aware of (in
/Restore the example, the schema is named "mycustomschema"). Because DSpace AIPs do not contain enough details to recreate the missing
Error: Metadata Schema, you must create it manually via the DSpace Admin UI. Please note that you only need to create the Schema.
"Unknown You do not need to manually create all the fields belonging to that schema, as DSpace will do that for you as it restores each
Metadata AIP. Once the schema is created in DSpace, re-run your restore command. DSpace will automatically re-create all fields belonging to
Schema that custom metadata schema as it restores each Item that uses that schema.
encountered
(mycustomsc
hema)"
Ingest Error: When you encounter this error message it means that an object could not be ingested/restored as it belongs to a parent object which
"Could not doesn't currently exist in your DSpace instance. During a full restore process, this error can be skipped over and treated as a warning by
find a parent specifying the '-o skipIfParentMissing=true' option (see Additional Packager Options). If you have a larger number of Items
DSpaceObje which are mapped to multiple Collections, the AIP Ingester will sometimes attempt to restore an item mapping before the Collection
ct itself has been restored (thus throwing this error). Luckily, this is not anything to be concerned about. As soon as the Collection is
referenced restored, the Item Mapping which caused the error will also be automatically restored. So, if you encounter this error during a full
as 'xxx/xxx'" restore, it is safe to bypass this error message using the '-o skipIfParentMissing=true' option. All your Item Mappings should
still be restored correctly.
Submit This error means that while submitting one or more AIPs, DSpace encountered a Handle conflict. This is a general error the may occur
Error: in DSpace if your Handle sequence has somehow become out-of-date. However, it's easy to fix. Just run the [dspace]/bin/dspace
PSQLExcepti database update-sequences
on: ERROR:
duplicate
key value
violates
unique
constraint
"handle_han
dle_key"
418
DSpace AIP Format
1 Makeup and Definition of AIPs
1.1 AIPs are Archival Information Packages.
1.2 General AIP Structure / Examples
1.2.1 Customizing What Is Stored in Your AIPs
2 AIP Details: METS Structure
3 Metadata in METS
3.1 DIM (DSpace Intermediate Metadata) Schema
3.1.1 DIM Descriptive Elements for Item objects
3.1.2 DIM Descriptive Elements for Collection objects
3.1.3 DIM Descriptive Elements for Community objects
3.1.4 DIM Descriptive Elements for Site objects
3.2 MODS Schema
3.3 AIP Technical Metadata Schema (AIP-TECHMD)
3.3.1 AIP Technical Metadata for Item
3.3.2 AIP Technical Metadata for Bitstream
3.3.3 AIP Technical Metadata for Collection
3.3.4 AIP Technical Metadata for Community
3.3.5 AIP Technical Metadata for Site
3.4 PREMIS Schema
3.4.1 PREMIS Metadata for Bitstream
3.5 DSPACE-ROLES Schema
3.5.1 Example of DSPACE-ROLES Schema for a SITE AIP
3.5.2 Example of DSPACE-ROLES Schema for a Community or Collection
3.6 METSRights Schema
3.6.1 Example of METSRights Schema for an Item
3.6.2 Example of METSRights Schema for a Collection
3.6.3 Example of METSRights Schema for a Community
If you are using the Item Level Versioning functionality (disabled by default), you must be aware that this "Item Level Versioning" feature is not yet
compatible with AIP Backup & Restore. Using them together may result in accidental data loss. Currently the AIPs that DSpace generates only store
the latest version of an Item. Therefore, past versions of Items will always be lost when you perform a restore / replace using AIP tools.
419
METS contains all metadata for Community and persistent IDs referencing all members (SubCommunities or Collections). Package may
also include a Logo file, if one exists.
METS contains any Group information for Community-specific groups (e.g. COMMUNITY_<ID>_ADMIN group).
METS contains all Community permissions/policies (translated into METSRights schema)
Collection AIP (Sample: [email protected])
METS contains all metadata for Collection and persistent IDs referencing all members (Items). Package may also include a Logo file, if
one exists.
METS contains any Group information for Collection-specific groups (e.g. COLLECTION_<ID>_ADMIN, COLLECTION_<ID>_SUBMIT,
etc.).
METS contains all Collection permissions/policies (translated into METSRights schema)
If the Collection has an Item Template, the METS will also contain all the metadata for that Item Template.
Item AIP (Sample: [email protected])
METS contains all metadata for Item and references to all Bundles and Bitstreams. Package also includes all Bitstream files.
METS contains all Item/Bundle/Bitstream permissions/policies (translated into METSRights schema)
Notes:
Bitstreams and Bundles are second-class archival objects; they are recorded in the context of an Item.
BitstreamFormats are not even second-class; they are described implicitly within Item technical metadata, and reconstructed from that during
restoration
EPeople are only defined in Site AIP, but may be referenced from Community or Collection AIPs
Groups may be defined in Site AIP, Community AIP or Collection AIP. Where they are defined depends on whether the Group relates specifically
to a single Community or Collection, or is just a general site-wide group.
DSpace Site configurations ([dspace]/config/ directory) or customizations (themes, stylesheets, etc) are not described in AIPs
DSpace Database model (or customizations therein) is not described in AIPs
Any objects which are not currently in the "In Archive" state are not described in AIPs. This means that in-progress, unfinished submissions are
never included in AIPs.
AIP Recommendations
It is recommended to minimally use the default settings when generating AIPs. DSpace can only restore information that is included within an AIP.
Therefore, if you choose to no longer include some information in an AIP, DSpace will no longer be able to restore that information from an AIP backup
1. You can customize your dspace.cfg settings pertaining to AIP generation. These configurations will allow you to specify exactly which DSpace
Crosswalks will be called when generating the AIP METS manifest.
2. You can export your AIPs using one of the special options/flags.
mets element
@PROFILE fixed value="http://www.dspace.org/schema/aip/1.0/mets.xsd" (this is how we identify an AIP manifest)
@OBJID URN-format persistent identifier (i.e. Handle) if available, or else a unique identifier. (e.g. "hdl:123456789/1")
@LABEL title if available
@TYPE DSpace object type, one of "DSpace ITEM", "DSpace COLLECTION", "DSpace COMMUNITY" or "DSpace SITE".
@ID is a globally unique identifier, built using the Handle and the Object type (e.g. dspace-COLLECTION-hdl:123456789/3).
mets/metsHdr element
@LASTMODDATE last-modified date for a DSpace Item, or nothing for other objects.
agent element:
@ROLE = "CUSTODIAN",
@TYPE = "OTHER",
@OTHERTYPE = "DSpace Archive",
name = Site handle. (Note: The Site Handle is of the format [handle_prefix]/0, e.g. "123456789/0")
agent element:
@ROLE = "CREATOR",
@TYPE = "OTHER",
@OTHERTYPE = "DSpace Software",
name = "DSpace [version]" (Where "[version]" is the specific version of DSpace software which created this AIP, e.g. "1.7.0")
mets/dmdSec element(s)
By default, two dmdSec elements are included for all AIPs:
1. object's descriptive metadata crosswalked to MODS (specified by mets/dmdSec/mdWrap@MDTYPE="MODS"). See #MODS
Schema section below for more information.
2. object's descriptive metadata in DSpace native DIM intermediate format, to serve as a complete and precise record for
restoration or ingestion into another DSpace. Specified by mets/dmdSec/mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="
DIM". See #DIM (DSpace Intermediate Metadata) Schema section below for more information.
420
For Collection AIPs, additional dmdSec elements may exist which describe the Item Template for that Collection. Since an Item template
is not an actual Item (i.e. it only includes metadata), it is stored within the Collection AIP. The Item Template's dmdSec elements will be
referenced by a div @TYPE="DSpace ITEM Template" in the METS structMap.
When the mdWrap @TYPE value is OTHER, the element MUST include a value for the @OTHERTYPE attribute which names the crosswalk
that produced (or interprets) that metadata, e.g. DIM.
mets/amdSec element(s)
One or more amdSec elements are include for all AIPs. The first amdSec element contains administrative metadata (technical, source,
rights, and provenance) for the entire archival object. Additional amdSec elements may exist to describe parts of the archival object (e.g.
Bitstreams or Bundles in an Item).
techMD elements. By default, two types of techMD elements may be included:
PREMIS metadata about an object may be included here (currently only specified for Bitstreams (files)). Specified by md
Wrap@MDTYPE="PREMIS". See #PREMIS Schema section below for more information.
DSPACE-ROLES metadata may appear here to describe the Groups or EPeople related to this object (_currently only
specified for Site, Community and Collection). Specified by mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="DSPACE-
ROLES". See #DSPACE-ROLES Schema section below for more information.
rightsMD elements. By default, there are four possible types of rightsMD elements which may be included:
METSRights metadata may appear here to describe the permissions on this object. Specified by mdWrap@MDTYPE="
OTHER",@OTHERMDTYPE="METSRIGHTS". See #METSRights Schema section below for more information.
DSpaceDepositLicense if the object is an Item and it has a deposit license, it is contained here. Specified by mdWra
p@MDTYPE="OTHER",@OTHERMDTYPE="DSpaceDepositLicense".
CreativeCommonsRDF If the object is an Item with a Creative Commons license expressed in RDF, it is included
here. Specified by mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="CreativeCommonsRDF".
CreativeCommonsText If the object is an Item with a Creative Commons license in plain text, it is included here.
Specified by mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="CreativeCommonsText".
sourceMD element. By default, there is only one type of sourceMD element which may appear:
AIP-TECHMD metadata may appear here. This stores basic technical/source metadata about in object in a DSpace
native format. Specified by mdWrap@MDTYPE="OTHER",@OTHERMDTYPE="AIP-TECHMD". See #AIP Technical
Metadata Schema (AIP-TECHMD) section below for more information.
digiprovMD element.
Not used at this time.
mets/fileSec element
For ITEM objects:
Each distinct Bundle in an Item goes into a fileGrp. The fileGrp has a @USE attribute which corresponds to the Bundle
name.
Bitstreams in bundles become file elements under fileGrp.
mets/fileSec/fileGrp/fileelements
Set @SIZE to length of the bitstream. There is a redundant value in the <techMD> but it is more accessible here.
Set @MIMETYPE, @CHECKSUM, @CHECKSUMTYPE to corresponding bitstream values. There is redundant info in the
<techMD>. (For DSpace, the @CHECKSUMTYPE="MD5" at all times)
SET @SEQ to bitstream's SequenceID if it has one.
SET @ADMID to the list of <amdSec> element(s) which describe this bitstream.
For COLLECTION and COMMUNITY objects:
Only if the object has a logo bitstream, there is a fileSec with one fileGrp child of @USE="LOGO".
The fileGrp contains one file element, representing the logo Bitstream. It has the same @MIMETYPE, @CHECKSUM, @CHECK
SUMTYPE attributes as the Item content bitstreams, but does NOT include metadata section references (e.g. @ADMID) or a @SEQ
attribute.
See the main structMap for the fptr reference to this logo file.
mets/structMap - Primary structure map, @LABEL="DSpace Object", @TYPE="LOGICAL"
For ITEM objects:
1. Top-Level div with @TYPE="DSpace Object Contents".
For every Bitstream in Item it contains a div with @TYPE="DSpace BITSTREAM". Each Bitstream div has a single f
ptr element which references the bitstream location.
If Item has primary bitstream, put it in structMap/div/fptr (i.e. directly under the div with @TYPE="DSpace Object
Contents")
For COLLECTION objects:
1. Top-Level div with @TYPE="DSpace Object Contents".
For every Item in the Collection, it contains a div with @TYPE="DSpace ITEM". Each Item div has up to two child mp
trelements:
a. One linking to the Handle of that Item. Its @LOCTYPE="HANDLE", and @xlink:href value is the raw Handle.
b. (Optional) one linking to the location of the local AIP for that Item (if known). Its @LOCTYPE="URL", and @xli
nk:href value is a relative link to the AIP file on the local filesystem.
If Collection has a Logo bitstream, there is an fptr reference to it in the very first div.
If the Collection includes an Item Template, there will be a div with @TYPE="DSpace ITEM Template" within the very first d
iv. This div @TYPE="DSpace ITEM Template" must have a @DMDID specified, which links to the dmdSec element(s) that
contain the metadata for the Item Template.
For COMMUNITY objects:
1. Top-Level div with @TYPE="DSpace Object Contents".
For every Sub-Community in the Community it contains a div with @TYPE="DSpace COMMUNITY". Each Community
div has up to two mptrelements:
a. One linking to the Handle of that Community. Its @LOCTYPE="HANDLE", and @xlink:href value is the raw
Handle.
b. (Optional) one linking to the location of the local AIP file for that Community (if known). Its @LOCTYPE="URL",
and @xlink:href value is a relative link to the AIP file on the local filesystem.
For every Collection in the Community there is a div with @TYPE="DSpace COLLECTION". Each Collection div has
up to two mptrelements:
a. One linking to the Handle of that Collection. Its @LOCTYPE="HANDLE", and @xlink:href value is the raw
Handle.
b.
421
b. (Optional) one linking to the location of the local AIP file for that Collection (if known). Its @LOCTYPE="URL",
and @xlink:href value is a relative link to the AIP file on the local filesystem.
If Community has a Logo bitstream, there is an fptr reference to it in the very first div.
For SITE objects:
1. Top-Level div with @TYPE="DSpace Object Contents".
For every Top-level Community in Site, it contains a div with @TYPE="DSpace COMMUNITY". Each Item div has up
to two child mptrelements:
a. One linking to the Handle of that Community. Its @LOCTYPE="HANDLE", and @xlink:href value is the raw
Handle.
b. (Optional) one linking to the location of the local AIP for that Community (if known). Its @LOCTYPE="URL",
and @xlink:href value is a relative link to the AIP file on the local filesystem.
mets/structMap - Structure Map to indicate object's Parent, @LABEL="Parent", @TYPE="LOGICAL"
Contains one div element which has the unique attribute value TYPE="AIP Parent Link" to identify it as the older of the parent
pointer.
It contains a mptr element whose xlink:href attribute value is the raw Handle of the parent object, e.g. 1721.1/4321.
Metadata in METS
The following tables describe how various metadata schemas are populated (via DSpace Crosswalks) in the METS file for an AIP.
In the METS structure, DIM metadata always appears within a dmdSec inside an <mdWrap MDTYPE="OTHER" OTHERMDTYPE="DIM"> element. For
example:
<dmdSec ID="dmdSec_2190">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="DIM">
...
</mdWrap>
</dmdSec>
By default, DIM metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg:
422
For Communities, the following fields are translated to the DIM schema:
MODS Schema
By default, all DSpace descriptive metadata (DIM) is also translated into the MODS Schema by utilizing DSpace's MODSDisseminationCrosswalk.
DSpace's DIM to MODS crosswalk is defined within your [dspace]/config/crosswalks/mods.properties configuration file. This file allows you to
customize the MODS that is included within your AIPs.
In the METS structure, MODS metadata always appears within a dmdSec inside an <mdWrap MDTYPE="MODS"> element. For example:
<dmdSec ID="dmdSec_2189">
<mdWrap MDTYPE="MODS">
...
</mdWrap>
</dmdSec>
By default, MODS metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg:
The MODS metadata is included within your AIP to support interoperability. It provides a way for other systems to interact with or ingest the AIP without
needing to understand the DIM Schema. You may choose to disable MODS if you wish, however this may decrease the likelihood that you'd be able to
easily ingest your AIPs into a non-DSpace system (unless that non-DSpace system is able to understand the DIM schema). When restoring/ingesting
AIPs, DSpace will always first attempt to restore DIM descriptive metadata. Only if no DIM metadata is found, will the MODS metadata be used during a
restore.
In the METS structure, AIP-TECHMD metadata always appears within a sourceMD inside an <mdWrap MDTYPE="OTHER" OTHERMDTYPE="AIP-
TECHMD"> element. For example:
<amdSec ID="amd_2191">
...
<sourceMD ID="sourceMD_2198">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="AIP-TECHMD">
...
</mdWrap>
</sourceMD>
...
</amdSec>
423
By default, AIP-TECHMD metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg:
aip.disseminate.sourceMD = AIP-TECHMD
dc.relation.isReferencedBy All other Collection's this item is linked to (Handle URN of each non-owner)
dc.format.supportlevel System Support Level for Format (necessary to recreate Format during restore, if the format isn't know to DSpace by default)
dc.format.internal Whether Format is internal (necessary to recreate Format during restore, if the format isn't know to DSpace by default)
Outstanding Question: Why are we recording the file format support status? That's a DSpace property, rather than an Item property. Do DSpace
instances rely on objects to tell them their support status?
Possible answer (from Larry Stone): Format support and other properties of the BitstreamFormat are recorded here in case the Item is
restored in an empty DSpace that doesn't have that format yet, and the relevant bits of the format entry have to be reconstructed from
the AIP. --lcs
dc.relation.isReferencedBy All other Communities this Collection is linked to (Handle URN of each non-owner)
424
PREMIS Schema
At this point in time, the PREMIS Schema is only used to represent technical metadata about DSpace Bitstreams (i.e. Files). The PREMIS metadata is
generated by DSpace's PREMISCrosswalk. Only the PREMIS Object Entity Schema is used.
In the METS structure, PREMIS metadata always appears within a techMD inside an <mdWrap MDTYPE="PREMIS"> element. PREMIS metadata is alwa
ys wrapped within a <premis:premis> element. For example:
<amdSec ID="amd_2209">
...
<techMD ID="techMD_2210">
<mdWrap MDTYPE="PREMIS">
<premis:premis>
...
</premis:premis>
</mdWrap>
</techMD>
...
</amdSec>
Each Bitstream (file) has its own amdSec within a METS manifest. So, there will be a separate PREMIS techMD for each Bitstream within a single Item.
By default, PREMIS metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg:
DSPACE-ROLES Schema
All DSpace Groups and EPeople objects are translated into a custom DSPACE-ROLES XML Schema. This XML Schema is a very simple representation of
the underlying DSpace database model for Groups and EPeople. The DSPACE-ROLES Schemas is generated by DSpace's RoleCrosswalk.
Only the following DSpace Objects utilize the DSPACE-ROLES Schema in their AIPs:
Site AIP – all Groups and EPeople are represented in DSPACE-ROLES Schema
Community AIP – only Community-based groups (e.g. COMMUNITY_1_ADMIN) are represented in DSPACE-ROLES Schema
Collection AIP – only Collection-based groups (e.g. COLLECTION_2_ADMIN, COLLECTION_2_SUBMIT, etc.) are represented in DSPACE-
ROLES Schema
In the METS structure, DSPACE-ROLES metadata always appears within a techMD inside an <mdWrap MDTYPE="OTHER" OTHERMDTYPE="DSPACE-
ROLES"> element. For example:
<amdSec ID="amd_2068">
...
<techMD ID="techMD_2070">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="DSPACE-ROLES">
...
</mdWrap>
</techMD>
...
</amdSec>
By default, DSPACE-ROLES metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg:
425
aip.disseminate.techMD = PREMIS, DSPACE-ROLES
<DSpaceRoles>
<Groups>
<Group ID="1" Name="Administrator">
<Members>
<Member ID="1" Name="[email protected]" />
</Members>
</Group>
<Group ID="0" Name="Anonymous" />
<Group ID="70" Name="COLLECTION_hdl:123456789/57_ADMIN">
<Members>
<Member ID="1" Name="[email protected]" />
</Members>
</Group>
<Group ID="75" Name="COLLECTION_hdl:123456789/57_DEFAULT_READ">
<MemberGroups>
<MemberGroup ID="0" Name="Anonymous" />
</MemberGroups>
</Group>
<Group ID="71" Name="COLLECTION_hdl:123456789/57_SUBMIT">
<Members>
<Member ID="1" Name="[email protected]" />
</Members>
</Group>
<Group ID="72" Name="COLLECTION_hdl:123456789/57_WORKFLOW_STEP_1">
<MemberGroups>
<MemberGroup ID="1" Name="Administrator" />
</MemberGroups>
</Group>
<Group ID="73" Name="COLLECTION_hdl:123456789/57_WORKFLOW_STEP_2">
<MemberGroups>
<MemberGroup ID="1" Name="Administrator" />
</MemberGroups>
</Group>
<Group ID="8" Name="COLLECTION_hdl:123456789/6703_DEFAULT_READ" />
<Group ID="9" Name="COLLECTION_hdl:123456789/2_ADMIN">
<Members>
<Member ID="1" Name="[email protected]" />
</Members>
</Group>
</Groups>
<People>
<Person ID="1">
<Email>[email protected]</Email>
<Netid>bsmith</Netid>
<FirstName>Bob</FirstName>
<LastName>Smith</LastName>
<Language>en</Language>
<CanLogin />
</Person>
<Person ID="2">
<Email>[email protected]</Email>
<FirstName>Jane</FirstName>
<LastName>Jones</LastName>
<Language>en</Language>
<CanLogin />
<SelfRegistered />
</Person>
</People>
</DSpaceRoles>
426
You may have noticed several odd looking group names in the above example, where a Handle is embedded in the name (e.g. "COLLECTION_hdl:
123456789/57_SUBMIT"). This is a translation of a Group name which included a Community or Collection Internal ID (e.g. "COLLECTION_45_SUBMIT").
Since you are exporting these Groups outside of DSpace, the Internal ID may no longer be valid or be understandable. Therefore, before export, these
Group names are all translated to include an externally understandable identifier, in the form of a Handle. If you use this AIP to restore your groups later,
they will be translated back to the normal DSpace format (i.e. the handle will be translated back to the new Internal ID).
Orphaned Groups are Renamed on Export
If a Group name includes a Community or Collection Internal ID (e.g. "COLLECTION_45_SUBMIT"), and that Community or Collection no longer exists,
then the Group is considered "Orphaned".
In 1.8.2 and above, the Group is renamed using the following format: "ORPHANED_[object-type]_GROUP_[obj-id]_[group-type]" (e.g.
"ORPHANED_COLLECTION_GROUP_10_ADMIN").
Prior to 1.8.2, the Group was renamed with a random key: "GROUP_[random-hex-key]_[object-type]_[group-type]" (e.g.
"GROUP_123eb3a_COLLECTION_ADMIN"). This old format was discontinued as giving the groups a randomly generated name caused the
SITE AIP to have a different checksum every time it was regenerated (see https://github.com/DSpace/DSpace/issues/4492).
The reasoning is that we were unable to translate an Internal ID into an External ID (i.e. Handle). If we are unable to do that translation, re-importing or
restoring a group with an old internal ID could cause conflicts or instability in your DSpace system. In order to avoid such conflicts, these groups are
renamed using a random, unique key.
This specific example is for a Collection, which has associated Administrator, Submitter, and Workflow approver groups. In this very simple example, each
group only has one Person as a member of it. Please notice that the Person's information (Name, NetID, etc) is NOT contained in this content (however
they are available in the DSPACE-ROLES example for a SITE, as shown above)
<DSpaceRoles>
<Groups>
<Group ID="9" Name="COLLECTION_hdl:123456789/2_ADMIN" Type="ADMIN">
<Members>
<Member ID="1" Name="[email protected]" />
</Members>
</Group>
<Group ID="13" Name="COLLECTION_hdl:123456789/2_SUBMIT" Type="SUBMIT">
<Members>
<Member ID="2" Name="[email protected]" />
</Members>
</Group>
<Group ID="10" Name="COLLECTION_hdl:123456789/2_WORKFLOW_STEP_1" Type="WORKFLOW_STEP_1">
<Members>
<Member ID="1" Name="[email protected]" />
</Members>
</Group>
<Group ID="11" Name="COLLECTION_hdl:123456789/2_WORKFLOW_STEP_2" Type="WORKFLOW_STEP_2">
<Members>
<Member ID="2" Name="[email protected]" />
</Members>
</Group>
<Group ID="12" Name="COLLECTION_hdl:123456789/2_WORKFLOW_STEP_3" Type="WORKFLOW_STEP_3">
<Members>
<Member ID="1" Name="[email protected]" />
</Members>
</Group>
</Groups>
</DSpaceRoles>
METSRights Schema
All DSpace Policies (permissions on objects) are translated into the METSRights schema. This is different than the above DSPACE-ROLES schema,
which only represents Groups and People objects. Instead, the METSRights schema is used to translate the permission statements (e.g. a group named
"Library Admins" has Administrative permissions on a Community named "University Library"). But the METSRights schema doesn't represent who is a
member of a particular group (that is defined in the DSPACE-ROLES schema, as described above).
The METSRights Schema must be used in conjunction with the DSPACE-ROLES Schema for Groups, People and Permissions to all be restored properly.
As mentioned above, the METSRights metadata can only be used to restore permissions (i.e. DSpace policies). The DSPACE-ROLES metadata must also
exist if you wish to restore the actual Group or EPeople objects to which those permissions apply.
427
All DSpace Object's AIPs (except for the SITE AIP) utilize the METSRights Schema in order to define what permissions people and groups have on that
object. Although there are several sections to the METSRights Schema, DSpace AIPs only use the <RightsDeclarationMD> section, as this is what is
used to describe rights on an object.
In the METS structure, METSRights metadata always appears within a rightsMD inside an <mdWrap MDTYPE="OTHER" OTHERMDTYPE="
METSRIGHTS"> element. For example:
<amdSec ID="amd_2068">
...
<rightsMD ID="rightsMD_2074">
<mdWrap MDTYPE="OTHER" OTHERMDTYPE="METSRIGHTS">
...
</mdWrap>
</rightsMD>
...
</amdSec>
By default, METSRights metadata is always included in AIPs. It is controlled by the following configuration in your dspace.cfg:
aip.disseminate.rightsMD = DSpaceDepositLicense:DSPACE_DEPLICENSE, \
CreativeCommonsRDF:DSPACE_CCRDF, CreativeCommonsText:DSPACE_CCTEXT, METSRIGHTS
Below is an example of a METSRights sections for a publicly visible Bitstream, Bundle or Item. Notice it specifies that the "GENERAL PUBLIC" has the
permission to DISCOVER or DISPLAY this object.
As of DSpace 3, DSpace policies/permissions may also have a "start-date" or "end-date" (to support Embargo functionality). Such a policy on an Item may
look like this. Notice it specifies that the "GENERAL PUBLIC" has the permission to DISCOVER or DISPLAY this object starting on 2015-01-01, while the
Group "Staff" has permission to DISCOVER or DISPLAY this object until 2015-01-01.
Below is an example of a METSRights sections for a publicly visible Collection, which also has an Administrator group, a Submitter group, and a group for
each of the three DSpace workflow approval steps. You'll notice that each of the groups is provided with very specific permissions within the Collection.
Submitters & Workflow approvers can "ADD CONTENTS" to a collection (but cannot delete the collection). Administrators have full rights.
428
OTHERPERMITTYPE="ADD CONTENTS" />
</rights:Context>
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_WORKFLOW_STEP_3</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true"
OTHERPERMITTYPE="ADD CONTENTS" />
</rights:Context>
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_WORKFLOW_STEP_2</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true"
OTHERPERMITTYPE="ADD CONTENTS" />
</rights:Context>
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_WORKFLOW_STEP_1</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="true" DELETE="false" OTHER="true"
OTHERPERMITTYPE="ADD CONTENTS" />
</rights:Context>
<rights:Context CONTEXTCLASS="MANAGED_GRP">
<rights:UserName USERTYPE="GROUP">COLLECTION_hdl:123456789/2_ADMIN</rights:UserName>
<rights:Permissions DISCOVER="true" DISPLAY="true" COPY="true" DUPLICATE="true" MODIFY="true" DELETE="true"
PRINT="true" OTHER="true" OTHERPERMITTYPE="ADMIN" />
</rights:Context>
<rights:Context CONTEXTCLASS="GENERAL PUBLIC">
<rights:Permissions DISCOVER="true" DISPLAY="true" MODIFY="false" DELETE="false" />
</rights:Context>
</rights:RightsDeclarationMD>
Below is an example of a METSRights sections for a publicly visible Community, which also has an Administrator group. As you'll notice, this content looks
very similar to the Collection METSRights section (as described above)
429
Ant targets and options
1 Options
2 Targets
A word of warning: in order to ensure proper permissions and file ownership are maintained, you are advised to run these ant targets as the service user
(commonly 'dspace' or 'tomcat'). Running them as any other user may cause permission problems
Options
DSpace allows three property values to be set using the -D<property>=<value> option. They may be used in other contexts than noted below, but take
care to understand how a particular property will affect a target's outcome.
overwrite
Whether to overwrite configuration files in [dspace]/config. If true, files from [dspace]/config and subdirectories are backed up with .old extension and
new files are installed from [dspace-src]/dspace/config and subdirectories; if false, existing config files are untouched, and new files are written beside
them with .new extension.
Default: true
config
If a path is specified, ant uses values from the specified file and installs it in [dspace]/config in the appropriate contexts.
Default: [dspace-src]/config/dspace.cfg
Context: update, update_configs, update_code, update_webapps, init_configs, fresh_install, test_database, setup_database, load_registries,
clean_database
wars
Default: true
Targets
Target Effect
update Creates backup copies of the [dspace]/bin, /etc, /lib, and /webapps directories with the form /<directory>.bak-<date-time>. Creates new
copies of [dspace]/config, /etc, and /lib directories. Does not affect data files or the database. (See overwrite, config, war options.)
update_ Updates the [dspace]/config directory with new configuration files. (See config option.)
configs
update_ Creates backup copies of the [dspace]/bin, /etc, and /lib directories with the form /<directory>.bak-<date-time>. Creates new copies of
code [dspace]/config, /etc, and /lib directories. (See config option.)
430
install_c Deletes existing [dspace]/bin, /lib, and /etc directories, and installs new copies; overwrites /solr application files, leaving data intact. (See conf
ode ig option.)
fresh_in Performs a fresh installation of the software, including the database & config. (See config, war options.)
stall
test_dat Tests database connection using parameters specified in dspace.cfg. (See config option.)
abase
clean_b Removes [dspace]/bin, /etc, /lib, and /webapps directories with .bak* extensions.
ackups
431
Command Line Operations
1 Executing command line operations
2 Available operations
2.1 General use
2.2 Legacy statistics
2.3 SOLR Statistics
The DSpace command launcher or CLI interface offers the execution of different maintenance operations. As most of these are already documented in
related parts of the documentation, this page is mainly intended to provide an overview of all available CLI operations, with links to the appropriate
documentation.
Examples:
bin/dspace -h
bin/dspace cleanup -h
bin/dspace cleanup
Available operations
Some operations can also be run as "Processes" (or Scripts) from the administrative User Interface or REST API. Those Scripts have a detailed
description in our REST API documentation at https://github.com/DSpace/RestContract/blob/main/scripts/
General use
bitstore-migrate: Migrate all files (bitstreams) from one assetstore (bitstore) to another
checker: Run the checksum checker
checker-emailer: Send emails related to the checksum checker
classpath: Calculate and display the DSpace classpath
cleanup: Remove deleted bitstreams from the assetstore
community-filiator: Tool to manage community and sub-community relationships
create-administrator: Create a DSpace administrator account (see Installing DSpace)
curate: Perform curation tasks on DSpace objects
database: Perform various tasks / checks of the DSpace database
doi-organiser: Transmit information about DOIs to the registration agency.
dsprop: View the value of a DSpace property from any configuration file (see Configuration Reference)
dsrun: Run a (DSpace) Java class directly (used mainly for test purposes)
embargo-lifter: Pre DSpace 3.0 embargo manager tool used to check, list and lift embargoes
export: Export items or collections
filter-media: Perform the media filtering to extract full text from documents and to create thumbnails
generate-sitemaps: Generate search engine and html sitemaps (see Search Engine Optimization)
harvest: Manage the OAI-PMH harvesting of external collections (see OAI harvesting docs)
import: Import items into DSpace (see Importing and Exporting Items via Simple Archive Format (SAF))
index-authority: import authorities and keep SOLR authority index up to date
index-discovery: Update Discovery (Solr) search and browse Index
itemupdate: Item update tool for altering metadata and bitstream content in items (see Updating Items via Simple Archive Format)
make-handle-config: Run the handle server simple setup command
metadata-export: Export metadata for batch editing
metadata-import: Import metadata after batch editing
migrate-embargo: Embargo manager tool used to migrate old version of Embargo to the new one included in dspace3
oai: OAI script manager
packager: Execute a packager
process-cleaner: Delete old Processes from the system
rdfizer: tool to convert contents to RDF
read : execute a stream of commands from a file or pipe
registry-loader: Load entries into a registry (see Metadata and Bitstream Format Registries)
structure-builder: Build DSpace community and collection structure (see Exporting and Importing Community and Collection Hierarchy)
sub-daily: Send daily subscription notices
test-email: Test the DSpace email server settings are OK
update-handle-prefix: Update handle records and metadata when moving from one Handle prefix to another
user: Create, List, Update, Delete EPerson (user) records
validate-date: Test date-time format rules
version: Show DSpace version and other troubleshooting information
432
Legacy statistics
DSpace 7.x does not yet support
Legacy/log based statistics are not available in DSpace 7.x. They are under discussion as this feature is not widely used. Tentatively, they are scheduled
for possible removal. See https://github.com/DSpace/DSpace/issues/2852
Legacy statistics parse the DSpace log files and compile information based on the "[dspace]/config/dstat.cfg" configuration file. They are no longer actively
maintained, but still exist in the codebase because there is information they report on that is not yet accessible in (or replaced by) SOLR Statistics. Where
possible, we recommend using SOLR Statistics and/or Google Analytics for more accurate data.
SOLR Statistics
Scripts for the statistics that are stored in SOLR:
solr-export-statistics:Export Solr statistics data to CSV (for backup or moving to another server)
solr-import-statistics: Import Solr statistics data from CSV (for restoration, or moving to another server)
solr-reindex-statistics: Reindex Solr statistics data (for upgrades or updates to Solr schema)
stats-log-converter: Convert dspace.log files ready for import into solr statistics
stats-log-importer: Import previously converted log files into solr statistics
stats-util: Statistics Client for Maintenance of Solr Statistics Indexes
433
Database Utilities
1 "database" command
2 Guide to Flyway Migration States
"database" command
This command can be used at any time to manage or upgrade the Database. It will also assist in troubleshooting PostgreSQL and Oracle connection
issues with the database.
Valid Description
Arguments:
test Test the database connection settings (in [dspace]/config/dspace.cfg or local.cfg) are OK and working properly. This
command also validates the database version is compatible with DSpace.
info / Provide detailed information about the DSpace database itself. This includes the database type, version, driver, schema, and any
status successful/failed/pending database migrations.
This command, along with "test", is very useful in debugging issues with your database.
migrate Migrate the database to the latest version (if not already on the latest version). This uses FlywayDB along with embedded migrations
scripts to automatically update your database to the latest version.
"migrate ignored" will run a migration which also includes any database migrations which are flagged as "Ignored" (or
"Skipped") by the "info" command. If these "Ignored" migrations succeed, they will now be noted (in the "info" command) as
having run "Out Of Order" (i.e. they were successful, but they were executed out of the normal, numerical order).
"migrate force" (available in 7.1 and later) will run a migration even when no new migrations exist (i.e. no migrations are
currently flagged as "Pending" when using the "info" command). This can be used to force the post-migration ("callback") scripts
to run. Normally, these post-migration scripts only run after a new migrations are applied. They will (re-)initialize your database
with required objects, like the "Site" object, default groups (Administrator/Anonymous) and default metadata registry and bitstream
format registry entries.
repair Attempt to "repair" any migrations which are flagged as "Failed" by the "info" command and/or resolve failed checksum validation. This
runs the FlywayDB repair command.
Please note however, this will NOT automatically repair corrupt or broken data in your database. It merely tries to re-run previously
"Failed" migrations and/or realign the checksums of the applied migrations to the ones of the available migrations.
skip (Available in 7.5 and later) Allows you to "skip" individual database migrations. Skipping a migration will flag it as having run
successfully (either "Success" or "Out of Order" status), but the migration will not be executed.
WARNING: You should ONLY skip migrations which are no longer required or have become obsolete. Skipping a REQUIRED
migration may result in DSpace failing to startup or function properly. The only fix to that scenario would be to run the migration
manually (by executing the SQL directly on the database). Therefore, this "skip" command should ONLY be used when the migration
is known to be obsolete or no longer valid. All other usages are unsupported.
update- Update database sequences after running a bulk ingest (e.g. AIP Backup and Restore) or data migration.
sequences
validate Validate the checksums of all previously run database migrations. This runs the FlywayDB 'validate' command.
clean Completely and permanently delete all tables and data in this database. WARNING: There is no turning back! If you run this command,
you will lose your entire database and all its contents.
This command is only useful for testing or for reverting your database to a "fresh install" state (e.
g. running "dspace database clean" followed by "dspace database migrate" will
return your database to a fresh install state)
By default the 'clean' command is disabled (to avoid accidental data loss). In order to enable it,
you must first set db.cleanDisabled=false in either your local.cfg or dspace.cfg.
434
Guide to Flyway Migration States
Whenever you run the "info" or "status" option (e.g. ./dspace database info), you'll see a table listing all Flyway migrations which were run on your
database.
In the "State" column of that table you'll see the various states that Flyway returns for each migration:
"Baseline" - This state simply notes when your database was originally created. In other words, it's the first migration in the table
"Success" - The state means that the migration was successful. It completed without any errors.
"Out of Order" - This state means the migration was successful but was run in a different order than the default. By default, Flyway will run all
migrations in numerical order (based on the "Version" column in that table). These "Out of Order" entries are normal and will occur in almost
every production DSpace. These entries are a sign of backported bug fixes which required a database migration. Those backported fixes may
be required to run in non-numerical order.
For example: If we found a database issue that impacts both 7.x and 8.x, we may create a migration numbered "7.9.[date]" which is safe
to run on both 7.x and 8.x. If your site is already on 8.x, this migration would be flagged as "Out of Order" because it starts with "7.9.*"
and was run after other "8.0.*" migrations were already run.
"Ignored" - This state means the migration was skipped (because it's not in numerical order) and has not yet been run. To run it, you'd run ".
/dspace database migrate ignored". Ignored migrations will move into the "Out of Order" state if they are successfully run.
"Pending" - This state means the migration has not yet been run. It's waiting for the next time you call "./dspace database migrate".
435
Executing streams of commands
You can pass a sequence of commands into the dspace command-line tool using the read command.
436
Handle.Net Registry Support
DSpace comes with support for CNRI's Handle.Net Registry (HNR). This feature is completely optional, as DSpace functions the same with or without
using a Handle Server/Registry.
You'll notice that while you've been playing around with a test server, DSpace has apparently been creating (fake) handles for you looking like hdl:
123456789/24 and so forth. These aren't really Handles, since the global Handle system doesn't actually know about them, and lots of other
DSpace test installs will have created the same IDs. They're only really Handles once you've registered a prefix with CNRI (see below) and have
correctly set up the Handle server included in the DSpace distribution. This Handle server communicates with the rest of the global Handle
infrastructure so that anyone that understands Handles can find the Handles your DSpace has created.
If you want to use the Handle system, you'll need to set up a Handle server. One is included with DSpace.
If you want to use the Handle system, you'll need to obtain a Handle prefix from the central CNRI Handle site. This requires a small yearly fee to
CNRI
Again, all of this is completely optional. But, the key benefit is that it provides you with persistent, permanent URLs (of the form https://hdl.
handle.net/[prefix]/[suffix]) for every object within your DSpace site. Those persistent URLs may be useful for citations or even
during upgrades/migrations, as DSpace + Handle.Net ensures that these URLs always go to the right object, even if your site's main URL
changes.
A Handle server runs as a separate process that receives TCP requests from other Handle servers, and issues resolution requests to a global server or
servers if a Handle entered locally does not correspond to some local content. The Handle protocol is based on TCP, so it will need to be installed on a
server that can send and receive TCP on port 2641.
You can either use a Handle server running on the same machine as DSpace, or you can install it on a separate machine. Installing it on the same
machine is a little bit easier. If you install it on a separate machine, you can use one Handle server for more than one DSpace installation.
If you choose to set a passphrase, you may need to start the Handle Server via: [dspace]\bin\dspace dsrun net.handle.server.Main
[dspace]\handle-server
1. To configure your DSpace installation to run the handle server, run the following command:
[dspace]/bin/make-handle-config
Ensure that [dspace]/handle-server matches whatever you have in dspace.cfg for the handle.dir property. You will need to answer a
series of qestions to configure the server. For the most part, you can use the default options, except you should choose to not encrypt
your certificates when prompted.
2. Edit the resulting [dspace]/handle-server/config.dct file to include the following lines in the "server_config"clause:
"storage_type" = "CUSTOM"
"storage_class" = "org.dspace.handle.HandlePlugin"
"enable_txn_queue" = "no"
This tells the Handle server to get information about individual Handles from the DSpace code and to disable transaction replication. If you used
the make-handle-config script, these should already be set in your config.dct file.
3. Once the configuration file has been generated, you will need to go to https://hdl.handle.net/4263537/5014 to upload the generated sitebndl.zip
file. The upload page will ask you for your contact information. An administrator will then create the naming authority/prefix on the root service
(known as the Global Handle Registry), and notify you when this has been completed. You will not be able to continue the handle server
installation until you receive further information concerning your naming authority.
4. When CNRI has sent you your naming authority prefix, you will need to edit the config.dct file. The file will be found in /[dspace]/handle-server.
Look for "300:0.NA/123456789". Replace 123456789 with the assigned naming authority prefix sent to you. Also change the value of handle.
prefix in [dspace]/config/local.cfg from "123456789" to your assigned naming authority prefix, so that DSpace will use that prefix in
assigning new Handles.
5. Now start your handle server (as the dspace user):
437
5.
[dspace]/bin/start-handle-server
[dspace]/bin/start-handle-server.bat
Note that since the DSpace code manages individual Handles, administrative operations such as Handle creation and modification aren't supported by
DSpace's Handle server.
The Handle server you use must be dedicated to resolve Handles from DSpace. You cannot use a Handle server that is in use with other software already.
You can use CNRI's Handle Software -- all you have to do is to add to it a plugin that is provided by DSpace. The following instructions were tested with
CNRI's Handle software version 9.1.0. You can do the following steps on another machine than the machine DSpace runs on, but you have to copy some
files from the machine on which DSpace is installed.
1. Set the following two configuration properties for every DSpace backend that your are running:
DSpace backend configuration to activate the endpoints used by the remote handle resolver
handle.remote-resolver.enabled = true
handle.hide.listhandles = false
2. Download the CNRI Handle Software: http:s//www.handle.net/download.html. In the tarball you'll find an README.txt with installation instructions
-- follow it.
3. After installing the CNRI Handle Software you should have two directories: once that contains the CNRI software and one that contains the
configuration of you local Handle Server. For the rest of this instruction we assume that the directory containing the CNRI Software is /hs/handle-
9.1.0 and the directory containing the configuration of your local server is /hs/srv_1. (We use the same paths here as CNRIs README.txt.)
4. Download the plugin from https://github.com/DSpace/Remote-Handle-Resolver/releases. Select a release. You can get the source and build it
yourself, or just use the JAR file included in the release. In either case, once you have a dspace-remote-handle-resolver-VERSION.jar,
copy it to the directory containing the CNRI software (/hs/handle-9.1.0/lib).
5. Create the directory /hs/srv_1/logs.
6. Create the following two files in /hs/srv_1.
log4j-handle-plugin.properties
log4j.rootCategory=INFO, A1
log4j.appender.A1=org.apache.log4j.DailyRollingFileAppender
log4j.appender.A1.File=/hs/srv_1/logs/handle-plugin.log
log4j.appender.A1.DatePattern= '.' yyyy-MM-dd
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%d %-5p %c @ %m%n
log4j.logger.org.apache.axis.handlers.http.HTTPAuthHandler=INFO
Change the path in the third line, if necessary. It must point to the DSpace 7 Rest API (as configured in $dspace.server.url).
handle-dspace-plugin.cfg
If you run more than one DSpace Installation, you may add more DSpace Endpoints. Just increase the number at the end of the key for each: en
dpoint2, endpoint3....
7. Edit the file /hs/srv_1/config.dct to include the following lines in the " server_config" clause:
"storage_type" = "CUSTOM"
"storage_class" = "org.dspace.handle.MultiRemoteDSpaceRepositoryHandlePlugin"
8. Edit /hs/handle-9.1.0/bin/hdl:
a. Find a line that contains exec java ... net.handle.server.Main ...
438
8.
Please note: The Handle Server will only start if it is able to connect to at least one running DSpace Installation. It only resolves the handles of the
DSpace Installations that were running when it was started.
1. Handles that don't exist will still generate a Handle record with a URL, even though resolving that URL will show an error page.
2. Handle records can only be generated based on the handle and the template. If you need to look up information in DSpace in or to geneate the
correct url for a given handle, you will need to use a storage plugin instead.
The Handle server you use must be dedicated to resolve Handles from DSpace. You cannot use a Handle server that is in use with other software already.
The following instructions were tested with CNRI's Handle software version 9.1.0.
In the "namespace" section, replace "https://demo.dspace.org/handle/" with the url endpoint for your DSpace server. The "${handle}"
part of the template will be replaced with the full handle to be resolved.
4. If your handle server is running, restart it.
This configuration is a minimal example of how to configure template handles for DSpace. For more details about configuring template handles, see the Ha
ndle Technical Manual, Chapter 11 (PDF download).
This script will change any handles currently assigned prefix 123456789 to prefix 1303, so for example handle 123456789/23 will be updated to 1303/23 in
the database.
439
Logical Item Filtering and DOI Filtered Provider for DSpace
Section One: DSpace Logical Item filtering (org.dspace.content.logic.*)
LogicalStatement
Filters
Operators
Conditions
Configuring Filters in Spring
Running Tests on the Command Line
Using Filters in other Spring Services
Section Two: DOI Filtered Provider
New FilteredProvider: DOIIdentifierProvider
New Curation Task:
LogicalStatement
LogicalStatement is a simple interface ultimately implemented by all the other interfaces and classes described below. It just requires that a class
implements a Boolean getResult(context, item) method.
Filters
Filters are at the root of any test definition, and it is the filter ID that is used to load up the filter in spring configurations for other services, or with DSpace
Service Manager.
A filter bean is defined with a single “statement” property - this could be an Operator, to begin a longer logical statement, or a Condition, to perform a
simple check.
Operators
Operators are the basic logical building blocks that implement operations like AND, OR, NOT, NAND and NOR. An Operator can contain any number of
other Operators or Conditions.
Conditions
Conditions are where the actual DSpace item evaluation code is written. A condition accepts a Map<String, Object> map of parameters. Conditions don’t
contain any other LogicalStatement classes – the are at the bottom of the chain.
A condition could be something like MetadataValueMatchCondition, where a regex pattern and field name are passed as parameters, then tested against
actual item metadata. If the regex matches, the boolean result is true.
Typically, commonly used Conditions will be defined as beans elsewhere in the spring config and then referenced inside Filters and Operators to create
more complex statements.
Here’s a complete example of a filter definition that implements the same rules as the XOAI openAireFilter. As an exercise, some statements will be
defined as beans externally, and some will be defined inline as part of the filter.
This condition creates a new bean to test metadata values. In this case, we’re implementing “ends with” for a list of type patterns.
440
<!-- dc.type ends with any of the listed values, as per XOAI "driverDocumentTypeCondition" -->
<bean id="driver-document-type_condition"
class="org.dspace.content.logic.condition.MetadataValuesMatchCondition">
<property name="parameters">
<map>
<entry key="field" value="dc.type" />
<entry key="patterns">
<list>
<value>article$</value>
<value>bachelorThesis$</value>
<value>masterThesis$</value>
<value>doctoralThesis$</value>
<value>book$</value>
<value>bookPart$</value>
<value>review$</value>
<value>conferenceObject$</value>
<value>lecture$</value>
<value>workingPaper$</value>
<value>preprint$</value>
<value>report$</value>
<value>annotation$</value>
<value>contributionToPeriodical$</value>
<value>patent$</value>
<value>dataset$</value>
<value>other$</value>
</list>
</entry>
</map>
</property>
</bean>
This condition accepts group and action parameters, then inspects item policies for a match - if the supplied group can perform the action, the result is true.
<bean id="item-is-public_condition"
class="org.dspace.content.logic.condition.ReadableByGroupCondition">
<property name="parameters">
<map>
<entry key="group" value="Anonymous" />
<entry key="action" value="READ" />
</map>
</property>
</bean>
The first statement is an And Operator, with many sub-statements – four Conditions, and an Or statement.
The first two statements in this Operator are simple Conditions defined in-line, and just check for a non-empty value in a couple of metadata fields.
The third statement is a reference to the document type Condition we made earlier:
<ref bean="driver-document-type_condition" />
The fourth statement is another Operator, in this case an Or Operator with two Conditions (the is-public Condition we defined earlier, and an in-line
definition of as “is-withdrawn” Condition)
The fifth statement is an in-line definition of a Condition that checks dc.relation metadata for a valid OpenAIRE identifier.
(has-title AND has-author AND has-driver-type AND (is-public OR is-withdrawn) AND has-valid-relation)
441
<!-- An example of an OpenAIRE compliance filter based on the same rules in xoai.xml
some sub-statements are defined within this bean, and some are referenced from earlier definitions
-->
<bean id="openaire_filter" class="org.dspace.content.logic.DefaultFilter">
<property name="statement">
<bean class="org.dspace.content.logic.operator.And">
<property name="statements">
<list>
<!-- Has a non-empty title -->
<bean id="has-title_condition"
class="org.dspace.content.logic.condition.MetadataValueMatchCondition">
<property name="parameters">
<map>
<entry key="field" value="dc.title" />
<entry key="pattern" value=".*" />
</map>
</property>
</bean>
<!-- AND has a non-empty author -->
<bean id="has-author_condition"
class="org.dspace.content.logic.condition.MetadataValueMatchCondition">
<property name="parameters">
<map>
<entry key="field" value="dc.contributor.author" />
<entry key="pattern" value=".*" />
</map>
</property>
</bean>
<!-- AND has a valid DRIVER document type (defined earlier) -->
<ref bean="driver-document-type_condition" />
<!-- AND (the item is publicly accessible OR withdrawn) -->
<bean class="org.dspace.content.logic.operator.Or">
<property name="statements">
<list>
<!-- item is public, defined earlier -->
<ref bean="item-is-public_condition" />
<!-- OR item is withdrawn, for tombstoning -->
<bean class="org.dspace.content.logic.condition.IsWithdrawnCondition">
<property name="parameters"><map></map></property>
</bean>
</list>
</property>
</bean>
<!-- AND the dc.relation is a valid OpenAIRE identifier
(starts with "info:eu-repo/grantAgreement/") -->
<bean id="has-openaire-relation_condition"
class="org.dspace.content.logic.condition.MetadataValueMatchCondition">
<property name="parameters">
<map>
<entry key="field" value="dc.relation" />
<entry key="pattern" value="^info:eu-repo/grantAgreement/" />
</map>
</property>
</bean>
</list>
</property>
</bean>
</property>
</bean>
442
<bean id="org.dspace.identifier.DOIIdentifierProvider"
class="org.dspace.identifier.FilteredDOIIdentifierProvider"
scope="singleton">
<property name="configurationService"
ref="org.dspace.services.ConfigurationService" />
<property name="DOIConnector"
ref="org.dspace.identifier.doi.DOIConnector" />
<property name="filterService"
ref="openaire_filter"/>
</bean>
In the provider, we just define the property with the other services and class variables:
@Required
public void setFilterService(Filter filterService) {
this.filterService = filterService;
}
Then you can actually run the tests with the service, like this:
try {
Boolean result = filterService.getResult(context, (Item) dso);
// do something with result
} catch(LogicalStatementException e) {
// ... handle exception ...
}
In the TestLogicRunner, you can see a way to get the filters by name using the DSpaceServiceManager as well.
This filter is always applied to the DOI consumer and other internal DOI service calls, and is applied by default to the `doi-organiser` tool (though it can be
optionally skipped with a command-line argument)
The filter is a spring property configured in identifier-service.xml, in the provider bean declaration.
The filterService property is optional. If it is missing from spring configuration, all items will get DOIs minted as per normal and the provider's filter service
will be null.
It is defined as follows:
<bean id="org.dspace.identifier.DOIIdentifierProvider"
class="org.dspace.identifier.FilteredDOIIdentifierProvider"
scope="singleton"> <property name="configurationService"
ref="org.dspace.services.ConfigurationService" />
<property name="DOIConnector" ref="org.dspace.identifier.doi.DOIConnector" />
<property name="filterService" ref="openaire_filter"/>
</bean>
In the DSpace 7 implementation, this feature can be used via the existing curation task framework, either in the CLI or in the Angular UI (when curation
tasks are implemented).
Configuration
This task is configured in ${dspace}/config/modules/curate.cfg as 'registerdoi' with the label "Register DOI".
There is a configuration file in ${dspace}/config/modules/doi-curation.cfg that can be used to customise the behaviour regarding filter skipping, and
distribution over multiple items.
443
### DOI registration curation task configuration module
##
# Should any logical filters be skipped when registering DOIs? (ie. *always* register, never filter out the item)
# Default: true
#doi-curation.skip-filter = true
##
# Should we allow the curation task to be distributed over communities / collections of items or the whole
repository?
# This *could* be dangerous if run accidentally over more items than intended.
# Default: false
#doi-curation.distributed = false
444
Mediafilters for Transforming DSpace Content
1 MediaFilters: Transforming DSpace Content
1.1 Overview
1.2 Available Media Filters
1.3 Enabling/Disabling MediaFilters
1.4 Executing (via Command Line)
1.5 Creating Custom MediaFilters
1.5.1 Creating a simple Media Filter
1.5.2 Creating a Dynamic or "Self-Named" Format Filter
1.6 Configuration parameters
Overview
DSpace can apply filters or transformations to files/bitstreams, creating new content. Filters are included that extract text for full-text searching, and
create thumbnails for items that contain images. The media filters are controlled by the dspace filter-media script which traverses the asset store,
invoking all configured MediaFilter or FormatFilter classes on files/bitstreams (see Configuring Media Filters for more information on how they are
configured).
Text org.dspace.app. As of 7.3, all text extraction for Full text indexing takes place in a single filter. This Adobe PDF, yes
Extractor mediafilter. filter uses Apache Tika which supports a wide variety of formats (e.g. Microsoft Microsoft formats (Word, PPT,
(7.3 or TikaTextExtract products, PDF, HTML, Text, etc). Additional formats may be configured from the Excel), CSV, HTML, RTF, Text,
above) ionFilter Tika supported formats list at https://tika.apache.org/2.3.0/formats.html OpenDocument formats (Text,
Spreadsheet, Presentation)
PDF Text org.dspace.app. extracts the full text of Adobe PDF documents (only if text-based or OCRed) for full Adobe PDF yes
Extractor mediafilter. text indexing. (Uses the Apache PDFBox tool)
(7.2 or PDFFilter
below)
HTML org.dspace.app. extracts the full text of HTML documents for full text indexing. (Uses Swing's HTML HTML, Text yes
Text mediafilter. Parser)
Extractor HTMLFilter
(7.2 or
below)
Word org.dspace.app. extracts the full text of Microsoft Word and Microsoft Word XML documents for full Microsoft Word, Microsoft Word yes
Text mediafilter. text indexing. (Uses the "Apache POI" tools.) XML
Extractor PoiWordFilter
(7.2 or
below)
Excel Text org.dspace.app. extracts the full text of Microsoft Excel documents for full text indexing. (Uses the "Ap Microsoft Excel, Microsoft yes
Extractor mediafilter.Exc ache POI" tools.) Excel XML
(7.2 or elFilter
below)
PowerPoi org.dspace.app. extracts the full text of slides and notes in Microsoft PowerPoint and PowerPoint Microsoft Powerpoint, Microsoft yes
nt Text mediafilter. XML documents for full text indexing. (Uses the Apache POI tools.) Powerpoint XML
Extractor PowerPointFilter
(7.2 or
below)
PDFBox org.dspace.app. creates thumbnail images of the first page of PDF files Adobe PDF yes
JPEG mediafilter.
Thumbnail PDFBoxThumbnail
JPEG org.dspace.app. creates thumbnail images of GIF, JPEG and PNG files BMP, GIF, JPEG, image/png yes
Thumbnail mediafilter.
JPEGFilter
Branded org.dspace.app. creates a branded preview image for GIF, JPEG and PNG files BMP, GIF, JPEG, image/png no
Preview mediafilter.
JPEG BrandedPreviewJ
PEGFilter
445
ImageMa org.dspace.app. Uses ImageMagick to generate thumbnails for image bitstreams. Requires BMP, GIF, image/png, JPG, no
gick mediafilter. installation of ImageMagick on your server. See ImageMagick Media Filters. TIFF, JPEG, JPEG 2000
Image ImageMagickImag
Thumbnail eThumbnailFilter
Generator
ImageMa org.dspace.app. Uses ImageMagick and Ghostscript to generate thumbnails for PDF bitstreams. Adobe PDF no
gick PDF mediafilter. Requires installation of ImageMagick and Ghostscript on your server. See ImageMa
Thumbnail ImageMagickPdfT gick Media Filters.
Generator humbnailFilter
Please note that the filter-media script will automatically update the DSpace search index by default.
Enabling/Disabling MediaFilters
The media filter plugin configuration filter.plugins in dspace.cfg contains a list of all enabled media/format filter plugins (see Configuring Media
Filters for more information). By modifying the value of filter.plugins you can disable or enable MediaFilter plugins. The filter.plugins setting
can be set multiple times to enable multiple filters. Each filter must be enabled via its name (see "Name" column in the table above).
[dspace]/bin/dspace filter-media
With no options, this traverses the asset store, applying media filters to bitstreams, and skipping bitstreams that have already been filtered.
446
Adding your own filters is done by creating a class which implements the org.dspace.app.mediafilter.FormatFilter interface. See the Creating
a new Media/Format Filter topic and comments in the source file FormatFilter.java for more information. In theory filters could be implemented in any
programming language (C, Perl, etc.) However, they need to be invoked by the Java code in the Media Filter class that you create.
Alternatively, you could extend the org.dspace.app.mediafilter.MediaFilter class, which just defaults to performing no pre/post-processing of bitstreams
before or after filtering.
You must give your new filter a "name", by adding it and its name to the plugin.named.org.dspace.app.mediafilter.FormatFilter field in dspace.cfg. In
addition to naming your filter, make sure to specify its input formats in the filter.<class path>.inputFormats config item. Note the input formats must match
the short description field in the Bitstream Format Registry (i.e. bitstreamformatregistry table).
plugin.named.org.dspace.app.mediafilter.FormatFilter = \
org.dspace.app.mediafilter.MySimpleMediaFilter = My Simple Text Filter, \ ...
filter.org.dspace.app.mediafilter.MySimpleMediaFilter.inputFormats =
Text
If you neglect to define the inputFormats for a particular filter, the MediaFilterManager will never call that filter, since it will never find a bitstream which has
a format matching that filter's input format(s).
If you have a complex Media Filter class, which actually performs different filtering for different formats (e.g. conversion from Word to PDF and conversion
from Excel to CSV), you should define this as described in Chapter 13.3.2.2 .
Since SelfNamedPlugins are self-named (as stated), they must provide the various names the plugin uses by defining a getPluginNames() method.
Generally speaking, each "name" the plugin uses should correspond to a different type of filter it implements (e.g. "Word2PDF" and "Excel2CSV" are two
good names for a complex media filter which performs both Word to PDF and Excel to CSV conversions).
Self-Named Media/Format Filters are also configured differently in dspace.cfg. Below is a general template for a Self Named Filter (defined by an
imaginary MyComplexMediaFilter class, which can perform both Word to PDF and Excel to CSV conversions):
As shown above, each Self-Named Filter class must be listed in the plugin.selfnamed.org.dspace.app.mediafilter.FormatFilter item in ds
pace.cfg. In addition, each Self-Named Filter must define the input formats for each named plugin defined by that filter. In the above example the MyCo
mplexMediaFilter class is assumed to have defined two named plugins, Word2PDF and Excel2CSV. So, these two valid plugin names ("Word2PDF" and
"Excel2CSV") must be returned by the getPluginNames() method of the MyComplexMediaFilter class.
These named plugins take different input formats as defined above (see the corresponding inputFormats setting).
If you neglect to define the inputFormats for a particular named plugin, the MediaFilterManager will never call that plugin, since it will never find a
bitstream which has a format matching that plugin's input format(s).
For a particular Self-Named Filter, you are also welcome to define additional configuration settings in dspace.cfg. To continue with our current example,
each of our imaginary plugins actually results in a different output format (Word2PDF creates "Adobe PDF", while Excel2CSV creates "Comma Separated
Values"). To allow this complex Media Filter to be even more configurable (especially across institutions, with potential different "Bitstream Format
Registries"), you may wish to allow for the output format to be customizable for each named plugin. For example:
447
#Define output formats for each named plugin
filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Word2PDF.output Format = Adobe PDF
filter.org.dspace.app.mediafilter.MyComplexMediaFilter.Excel2CSV.outputFormat = Comma Separated Values
Any custom configuration fields in dspace.cfg defined by your filter are ignored by the MediaFilterManager, so it is up to your custom media filter class to
read those configurations and apply them as necessary. For example, you could use the following sample Java code in your MyComplexMediaFilter class
to read these custom outputFormat configurations from dspace.cfg:
Configuration parameters
Information By default, the "Text Extractor" only extracts the first 100,000 characters of text for full-text indexing. This setting allows you to increase or
al Note decrease that default. Set to -1 for no maximum. Keep in mind that larger values (or -1) are more likely to encounter
OutOfMemoryException errors when extracting text from very large files. In those scenarios, you may wish to consider instead enabling
"textextractor.use-temp-file" below to better control memory usage.
Information By default, the "Text Extractor" will perform all text extraction in memory (i.e. textextractor.use-temp-file=false). This ensures text
al Note extraction runs quickly, but it has the risk of hitting OutOfMemoryException errors if you either increase "textextractor.max-chars" or simply
don't have much available memory on the server. In those scenarios, you can set "textextractor.use-temp-file=true" in order to tell the text
extraction process to extract all text using a temporary file. This decreases the memory usage of the text extraction process, but will run
slightly slower.
Property filter.org.dspace.app.mediafilter.publicPermission
Information By default mediafilter derivatives / thumbnails inherit the permissions of the parent bitstream, but you can override this, in case you want
al Note to make publicly accessible derivative / thumbnail content, typically the thumbnails of objects for the browse list. List the MediaFilter
names that would get public accessible permissions. Any media filters not listed will instead inherit the permissions of the parent bitstream.
448
ImageMagick Media Filters
ImageMagick Media Filters
As of DSpace 7.6, the ImageMagick media filter also supports creating thumbnails of video (MP4) files, provided that "ffmpeg" is installed locally. See
instructions below.
Overview
The ImageMagick Media Filters provide consistent, high quality thumbnails for image bitstreams, PDF bitstreams and video (MP4) bitstreams.
These filters require a separate software installation of the conversion utilities: ImageMagick, Ghostscript (to support PDF thumbnails) and/or ffmpeg (to
support MP4 thumbnails).
The media filters use the library im4java to invoke the conversion utilities. This library constructs a conversion command launches a sub-process to
perform the generation of media files.
Installation
Before ImageMagick Media Filters can be used, you must setup ImageMagick (and optionally Ghostscript) as follows:
1. Install ImageMagick on your server. The installation process differs based on your operating system. For example, on Debian/Ubuntu, it's similar
to this:
2. If you wish to generate PDF thumbnails, install Ghostscript on your server. The installation process differs based on your operating system. For
example, on Debian/Ubuntu, it's similar to this:
3. (New in 7.6) If you wish to generate MP4 (video) thumbnails, install FFmpeg on your server. The installation process differs based on your
operating system. For example, on Debian/Ubuntu, it's similar to this:
4. The ImageMagick, Ghostscript, and FFmpeg executables should be accessible from the same directory (e.g. /usr/bin)
a. This directory MUST be defined in the org.dspace.app.mediafilter.ImageMagickThumbnailFilter.ProcessStarter
configuration as describe below.
DSpace Configuration
In the filter.plugins section of your dspace.cfg (or local.cfg) file, specify the ImageMagick media filters you wish to use.
449
local.cfg
# Make sure to always keep this plugin enabled if you want to support search within text documents
filter.plugin = Text Extractor
# NOTE: When "ImageMagick Image Thumbnail" is enabled, the default "JPEG Thumbnail" should NOT be enabled
filter.plugins = ImageMagick Image Thumbnail
# NOTE: When "ImageMagick PDF Thumbnail" is enabled, the default "PDFBox JPEG Thumbnail" should NOT be enabled
# Requires Ghostscript to also be installed
filter.plugins = ImageMagick PDF Thumbnail
This will activate the following settings which are already present in dspace.cfg (these do NOT need to be added, as they already exist)
plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.
ImageMagickImageThumbnailFilter = ImageMagick Image Thumbnail
plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter
= ImageMagick PDF Thumbnail
plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.
ImageMagickVideoThumbnailFilter = ImageMagick Video Thumbnail
These media filters contain the several properties which can be configured.
Thumbnail Dimensions
The following properties are used to define the dimensions of the generated thumbnails:
org.dspace.app.mediafilter.ImageMagickThumbnailFilter.ProcessStarter = /usr/bin
450
The ImageMagick media filter will use the bitstream description field to identify bitstreams that it has created using the following setting. Bitstreams
containing this label will be overwritten only if the -f filter is applied.
org.dspace.app.mediafilter.ImageMagickThumbnailFilter.bitstreamDescription = IM Thumbnail
Thumbnail descriptions that do not match either of the patterns listed above are presumed to be manually uploaded thumbnails. These thumbnails will not
be replaced even if the -f option is passed to the filter media process.
Flatten
DSpace uses the JPEG format for thumbnails. While JPEG doesn't support transparency, PDF, PNG and other formats do. As those formats are used as
outgoing material in DSpace, DSpace has to care about transparency during the generation of the thumbnails. In combinations of specific versions of
ImageMagick and Ghostscript it may occur that completely transparent areas will become black. As a solution ImageMagick recommends to flatten images
extracted from PDFs before they are stored as JPEG.
Since DSpace 5.2 the ImageMagick media filter flattens thumbnails extracted from PDFs. If you run into problems caused by flattening of the extracted
images, you can switch the flattening off by setting the following property in dspace.cfg to false:
org.dspace.app.mediafilter.ImageMagickThumbnailFilter.flatten = false
ICC Profiles
PDFs optimized for physical printing often use the CMYK color space. On the web, however, the de facto color system is sRGB. By default, DSpace's
ImageMagick-based thumbnailing system will create thumbnails that use the same color space as the source PDF. Most web browsers are not able to
correctly display images that use the CMYK color space, which leads to images with visibly inaccurate colors.
If you are using Ghostscript version 9 or above, it is possible for DSpace to correctly convert images from CMYK to sRGB by providing it with appropriate
ICC color profiles to use during thumbnail creation. Default ones are provided by most Ghostscript installations (version 9 or above). The following
configuration options tell DSpace where those ICC profiles are located.
# org.dspace.app.mediafilter.ImageMagickThumbnailFilter.cmyk_profile = /usr/share/ghostscript/9.18/iccprofiles
/default_cmyk.icc
# org.dspace.app.mediafilter.ImageMagickThumbnailFilter.srgb_profile = /usr/share/ghostscript/9.18/iccprofiles
/default_rgb.icc
You may need to adjust those paths for your OS or the version of Ghostscript that you have.
Providing ICC profiles to ImageMagick is optional. If these configuration properties are unset, no profiles will be supplied to ImageMagick, and thumbnails
produced from PDFs using the CMYK color space will also use CMYK. The transformation from CMYK to RGB is optional.
# org.dspace.app.mediafilter.ImageMagickThumbnailFilter.density = 144
The effect is most notable on PDFs with a lot of text, gradients, or curved lines. See the pull request implementing this feature for more information and
comparisons.
Additional Customization
The ImageMagick conversion software provides a large number of conversion options. Subclasses of these media filters could be written to take
advantage of the additional conversion properties available in the software.
Note: The PDF thumbnail generator is hard-coded to generate a thumbnail from the first page of the PDF.
451
ERROR filtering, skipping bitstream:
Item Handle: 1234/5678
Bundle Name: ORIGINAL
File Size: 30406135
Checksum: c1df4b3a4755e9bed956383b61fc5042 (MD5)
Asset Store: 0
org.im4java.core.CommandException: org.im4java.core.CommandException: convert.im6: not authorized `/tmp
/impdfthumb6294641076817830415.pdf' @ error/constitute.c/ReadImage/454.
OR
These may be caused by a change in your ImageMagick policy configuration on your server.
In Ubuntu, the default "policy.xml" was recently updated to exclude all Ghostscript formats (including PDF, PS, etc). See this ticket: https://bugs.
launchpad.net/ubuntu/+source/imagemagick/+bug/1796563
This exclusion was implemented to workaround a security vulnerability in Ghostscript reported here: https://www.kb.cert.org/vuls/id/332928
According to that vulnerability report, this was patched in Ghostscript v9.24 (or above)
To fix the error above requires you to re-enable ImageMagick to process Ghostscript format types. That can be done by simply commenting out those new
"policy" lines in the configuration file (surround them with <!-- and --> to comment out)
Be aware that you MUST ensure you are running Ghostscript v9.24 or later to ensure that you are not at risk for the above security vulnerability in older
versions of Ghostscript.
File: video.mp4.jpg
ERROR filtering, skipping bitstream:
Item Handle: 1234/5678
Bundle Name: ORIGINAL
File Size: 146761357
Checksum: 735ceb1b6b249afc84a5bb1b87ae0c02 (MD5)
Asset Store: 0
org.im4java.core.CommandException: convert-im6.q16: cache resources exhausted `/tmp/magick-64dziU-
1nQJjQHZYu4_R1fFP4l9en5iL.pam' @ error/cache.c/OpenPixelCache/4095.
These may be caused by too conservative resource policies in your policy.xml file. As an example, default values are located at /etc/ImageMagick-
6/policiy.xml in Debian 11 (Bullseye), and are:
452
/etc/ImageMagick-6/policiy.xml
<policymap>
<!-- <policy domain="resource" name="temporary-path" value="/tmp"/> -->
<policy domain="resource" name="memory" value="256MiB"/>
<policy domain="resource" name="map" value="512MiB"/>
<policy domain="resource" name="width" value="16KP"/>
<policy domain="resource" name="height" value="16KP"/>
<!-- <policy domain="resource" name="list-length" value="128"/> -->
<policy domain="resource" name="area" value="128MP"/>
<policy domain="resource" name="disk" value="1GiB"/>
To avoid the cache resources exhausted error, try increasing the resource limits policies. You may want to start by increasing the memory and disk
policies (disk cache is used when the memory limit is reached). The actual values have to be adjusted depending on the size of your video bitstreams and
the actual resources available in your installation. For example:
/etc/ImageMagick-6/policiy.xml
<policymap>
<!-- <policy domain="resource" name="temporary-path" value="/tmp"/> -->
<policy domain="resource" name="memory" value="4GiB"/> <!-- memory limit increased from 256MiB to 4GiB -->
<policy domain="resource" name="map" value="512MiB"/>
<policy domain="resource" name="width" value="16KP"/>
<policy domain="resource" name="height" value="16KP"/>
<!-- <policy domain="resource" name="list-length" value="128"/> -->
<policy domain="resource" name="area" value="128MP"/>
<policy domain="resource" name="disk" value="4GiB"/> <!-- disk limit increased from 1GiB to 4GiB -->
Once the limits are properly set, a successful execution of the filter should show a message similar to:
File: video.mp4.jpg
FILTERED: bitstream 12345678-abcd-efgh-ijkl-1234567890ab (item: 1234/5678) and created 'video.mp4.jpg'
453
Performance Tuning DSpace
1 Bare Minimum Requirements
2 Performance Tuning the Frontend (UI)
2.1 Use "cluster mode" of PM2 to avoid Node.js using a single CPU
2.2 Give Node.js more memory
2.3 Limit which pages are processed via Server Side Rendering (SSR)
2.4 Turn on (or increase) caching of Server-Side Rendered pages
3 Performance Tuning the Backend (REST API)
3.1 Give Tomcat More Memory
3.1.1 Give Tomcat More Java Heap Memory
3.1.2 Give Tomcat More Java PermGen Memory
3.1.3 Choosing the size of memory spaces allocated to DSpace Backend
3.2 Give the Command Line Tools More Memory
3.2.1 Give the Command Line Tools More Java Heap Memory
3.2.2 Give the Command Line Tools More Java PermGen Space Memory
4 Give PostgreSQL Database More Memory
5 Performance Tuning Solr
The software DSpace relies on does not come out of the box optimized for large repositories. Here are some tips to make it all run faster.
2GB of memory for the Frontend (UI) / Node.js. Highly active sites will need more.
1GB of memory for the Backend (REST API) / JVM / Tomcat. Highly active sites will need more.
512MB of memory for PostgreSQL database. Highly active sites will need more.
512MB of memory for Solr. Highly active sites may need more.
Extra memory may be required for command line scripts (which get kicked off in a separate JVM)
Keep in mind, because the frontend & backend can be run on separate servers, you can split this memory across two (or more) servers. You can even
choose to run PostgreSQL or Solr either alongside the backend or on their own dedicated server.
The DSpace frontend (UI) will often require several CPUs, especially if you wish to use "cluster mode" (see below) to better scale your application. A
smaller application may be able to use 4-6 CPU cores, while highly active sites may require additional CPU power. CPU is most often necessary for the
frontend's Angular Serve Side Rendering (again see "cluster mode" notes below) and for any batch processing / command line scripts on backend.
1. First, is by adding the "exec_mode" and "instances" settings to your JSON configuration as follows. You also may want to set the
"max_memory_restart" option to avoid PM2 using too much memory. These three settings are described in more detail below. NOTE: make sure
to start (or restart) your site to enable these settings (e.g. pm2 start dspace-ui.json)
454
dspace-ui.json
{
"apps": [
{
"name": "dspace-ui",
"cwd": "/full/path/to/dspace-ui-deploy",
"script": "dist/server/main.js",
"instances": "max",
"exec_mode": "cluster",
"env": {
"NODE_ENV": "production"
},
"max_memory_restart": "500M"
}
]
}
# Start the "dspace-ui" app. Cluster it across all available CPUs with a maximum memory of 500MB per CPU.
# This command is equivalent to the example cluster settings in the "dspace-ui.json" file above.
pm2 start dspace-ui.json -i max --max-memory-restart 500M
If you want to increase the memory available to Node.js, you can set the NODE_OPTIONS environment variable:
Limit which pages are processed via Server Side Rendering (SSR)
While enabling Server Side Rendering (SSR) is extremely important for Search Engine Optimization, it can also be very resource intensive for large pages
or highly active sites. Server Side Rendering involves building the entire HTML for the page in Node.js (on your server) before sending the page back to
the client/user. Most humans only encounter SSR briefly, when they initially visit your site. However, bots may only interact with SSR, especially if they
are unable to process Javascript. This is true even for Google Scholar, whose bots will only use SSR generated pages to index your site.
In order to maximum the performance of SSR, by default, DSpace will minimize the pages and Angular components that are processed during server side
rendering. You may wish to review the default settings to ensure they are appropriate for your site. See the Server Side Rendering (SSR) Settings
While server-side-rendering is highly recommended on all sites, it can result in Node.js having to pre-generate many HTML pages at once when a site has
a large number of simultaneous users/bots. This may cause Node.js to spend a lot of time processing server-side-rendered content, slowing down the
entire site.
Therefore, DSpace provides some basic caching of server-side rendered pages, which allows the same pre-generated HTML to be sent to many users
/bots at once & decreases the frequency of server-side rendering.
These settings are documented at User Interface Configuration: Cache Settings - Server Side Rendering (SSR)
455
Performance Tuning the Backend (REST API)
At the time of writing, DSpace recommends you should give Tomcat >= 512MB of Java Heap Memory to ensure optimal DSpace operation. Most larger
sized or highly active DSpace installations however tend to allocate more like 1024MB (1GB) to 2048MB (2G) or more of Java Heap Memory.
Performance tuning in Java basically boils down to memory. If you are seeing "java.lang.OutOfMemoryError: Java heap space" errors, this is a
sure sign that Tomcat isn't being provided with enough Heap Memory.
Tomcat is especially memory hungry, and will benefit from being given lots of RAM. To set the amount of memory available to Tomcat, use either the JAVA
_OPTS or CATALINA_OPTS environment variable, e.g:
CATALINA_OPTS=-Xmx512m -Xms512m
OR
JAVA_OPTS=-Xmx512m -Xms512m
The above example sets the maximum Java Heap memory to 512MB.
You can use either environment variable. JAVA_OPTS is also used by other Java programs (besides just Tomcat). CATALINA_OPTS is only used by
Tomcat. So, if you only want to tweak the memory available to Tomcat, it is recommended that you use CATALINA_OPTS. If you set both CATALINA_OPTS
and JAVA_OPTS, Tomcat will default to using the settings in CATALINA_OPTS.
If the machine is dedicated to DSpace a decent rule of thumb is to give tomcat half of the memory on your machine. At a minimum, you should give
Tomcat >= 512MB of memory for optimal DSpace operation. (NOTE: As your DSpace instance gets larger in size, you may need to increase this
number to the several GB range.) The latest guidance is to also set -Xms to the same value as -Xmx for server applications such as Tomcat.
At the time of writing, DSpace recommends you should give Tomcat >= 128MB of PermGen Space to ensure optimal DSpace operation.
If you are seeing "java.lang.OutOfMemoryError: PermGen space" errors, this is a sure sign that Tomcat is running out PermGen Memory. (More
info on PermGen Space: https://frankkieviet.blogspot.com/2006/10/classloader-leaks-dreaded-permgen-space.html)
To increase the amount of PermGen memory available to Tomcat (default=64MB), use either the JAVA_OPTS or CATALINA_OPTS environment variable, e.
g:
CATALINA_OPTS=-XX:MaxPermSize=128m
OR
JAVA_OPTS=-XX:MaxPermSize=128m
You can use either environment variable. JAVA_OPTS is also used by other Java programs (besides just Tomcat). CATALINA_OPTS is only used by
Tomcat. So, if you only want to tweak the memory available to Tomcat, it is recommended that you use CATALINA_OPTS. If you set both CATALINA_OPTS
and JAVA_OPTS, Tomcat will default to using the settings in CATALINA_OPTS.
Please note that you can obviously set both Tomcat's Heap space and PermGen Space together similar to:
CATALINA_OPTS=-Xmx512m -Xms512m -XX:MaxPermSize=128m
On an Ubuntu machine (10.04) at least, the file /etc/default/tomcat6 appears to be the best place to put these environmental variables.
456
psi-probe is a webapp that can be deployed in DSpace and be used to watch memory usage of the other webapps deployed in the same instance of
Tomcat (in our case, the DSpace server webapp).
cd [dspace]/webapps/
unzip ~/probe-3.1.0.zip
unzip probe.war -d probe
3. Add a Context element in Tomcat's configuration, and make it privileged (so that it can monitor the other webapps):
EITHER in $CATALINA_HOME/conf/server.xml
OR in $CATALINA_HOME/conf/Catalina/localhost/probe.xml
4. Edit $CATALINA_HOME/conf/tomcat-users.xml to add a user for logging into psi-probe (see more in https://github.com/psi-probe/psi-probe
/wiki/InstallationApacheTomcat)
5. Restart Tomcat
6. Open http://yourdspace.com:8080/probe/ (edit domain and port number as necessary) in your browser and use the username and password
from tomcat-users.xml to log in.
In the "System Information" tab, go to the "Memory utilization" menu. Note how much memory Tomcat is using upon startup and use a slightly higher value
than that for the -Xms parameter (initial Java heap size). Watch how big the various memory spaces get over time (hours or days), as you run various
common DSpace tasks that put load on memory, including indexing, reindexing, importing items into the oai index etc. These maximum values will
determine the -Xmx parameter (maximum Java heap size). Watching PS Perm Gen grow over time will let you choose the value for the -XX:
MaxPermSize parameter.
By default, DSpace only provides 256MB of maximum heap memory to its command-line tools.
If you'd like to provide more memory to command-line tools, you can do so via the JAVA_OPTS environment variable (which is used by the [dspace]/bin
/dspace script). Again, it's the same syntax as above:
JAVA_OPTS=-Xmx512m -Xms512m
This is especially useful for big batch jobs, which may require additional memory.
You can also edit the [dspace]/bin/dspace script and add the environmental variables to the script directly.
Give the Command Line Tools More Java PermGen Space Memory
Similar to Tomcat, you may also need to give the DSpace Java-based command-line tools more PermGen Space. If you are seeing "java.lang.
OutOfMemoryError: PermGen space" errors, when running a command-line tool, this is a sure sign that it isn't being provided with enough PermGen
Space.
If you'd like to provide more PermGen Space to command-line tools, you can do so via the JAVA_OPTS environment variable (which is used by the [dspa
ce]/bin/dspace script). Again, it's the same syntax as above:
457
JAVA_OPTS=-XX:MaxPermSize=128m
This is especially useful for big batch jobs, which may require additional memory.
Please note that you can obviously set both Java's Heap space and PermGen Space together similar to:
JAVA_OPTS=-Xmx512m -Xms512m -XX:MaxPermSize=128m
For more hints/tips with PostgreSQL configurations and performance tuning, see also:
PostgresPerformanceTuning
PostgresqlConfiguration
458
Ping or Healthcheck endpoints for confirming DSpace
services are functional
For some installations of DSpace, it might be helpful to have a URL you can configure as a healthcheck for some sort of monitoring system (Monit, Eye).
Some installations use load balancers, and those load balancers need a URL to check to confirm the system is functioning correctly. Here are some
suggestions for you to use.
Frontend
/home
Be sure to append that path to the main URL of your DSpace instance's frontend URL. For example: https://demo7.dspace.org/home
Backend
/server/api/core/collections
/server/api/core/sites
Be sure to append these paths to the main URL of your DSpace instance's backend URL. For example: https://api7.dspace.org/server/api/core/collections
Both of those endpoints will throw an error if Solr is down or similar, and both are anonymously available (no login required).
/server/actuator/health
{"status":"UP"}
459
Scheduled Tasks via Cron
Several DSpace features require that a script is run regularly (via cron, or similar). Some of these features include:
the e-mail subscription feature that alerts users of new items being deposited;
the 'media filter' tool, that generates thumbnails of images and extracts the full-text of documents for indexing;
the (optional) 'checksum checker' that tests the bitstreams in your repository for corruption;
and the (optional) registration of DOIs using DataCite as registration agency.
Updating the geolocation database used to enrich usage statistics. At this writing, the database publisher issues monthly updates.
These regularly scheduled tasks should be setup via either cron (for Linux/Mac OSX) or Windows Task Scheduler (for Windows).
crontab -e
While every DSpace installation is unique, in order to get the most out of DSpace, we highly recommend enabling these basic cron settings (the settings
are described in the comments):
#-----------------
# GLOBAL VARIABLES
#-----------------
# Full path of your local DSpace Installation (e.g. /home/dspace or /dspace or similar)
# MAKE SURE TO CHANGE THIS VALUE!!!
DSPACE = [dspace]
# Shell to use
SHELL=/bin/sh
#--------------
# HOURLY TASKS (Recommended to be run multiple times per day, if possible)
# At a minimum these tasks should be run daily.
#--------------
# Send information about new and changed DOIs to the DOI registration agency
# NOTE: ONLY NECESSARY IF YOU REGISTER DOIS USING DATACITE AS REGISTRATION AGENCY (Disabled by default)
# 0 4,12,20 * * * $DSPACE/bin/dspace doi-organiser -u -q ; $DSPACE/bin/dspace doi-organiser -s -q ; $DSPACE/bin
/dspace doi-organiser -r -q ; $DSPACE/bin/dspace doi-organiser -d -q
#----------------
# DAILY TASKS
# (Recommended to be run once per day. Feel free to tweak the scheduled times below.)
#----------------
# Update the OAI-PMH index with the newest content at midnight every day
# REQUIRED to update content available in OAI-PMH (However, it can be removed if you do not enable OAI-PMH)
0 0 * * * $DSPACE/bin/dspace oai import > /dev/null
460
# Clean and Update the Discovery indexes at midnight every day
# (This ensures that any deleted documents are cleaned from the Discovery search/browse index)
# RECOMMENDED to ensure your search/browse index stays fresh.
0 0 * * * $DSPACE/bin/dspace index-discovery > /dev/null
# run the index-authority script once a day at 12:45 to ensure the Solr Authority cache is up to date
45 0 * * * $DSPACE/bin/dspace index-authority > /dev/null
# Cleanup Web Spiders from DSpace Statistics Solr Index at 01:00 every day
# (This removes any known web spiders from your usage statistics)
# RECOMMENDED if you are running Solr Statistics.
0 1 * * * $DSPACE/bin/dspace stats-util -i
#----------------
# WEEKLY TASKS
# (Recommended to be run once per week, but can be run more or less frequently, based on your local needs
/policies)
#----------------
# Send out "weekly" update subscription e-mails at 02:00 every Sunday
# (This sends an email to any users who have "subscribed" to a Community/Collection, notifying them of newly
added content.)
# REQUIRED for weekly "Email Subscriptions" to work properly.
0 2 * * 0 $DSPACE/bin/dspace subscription-send -f W
#----------------
# MONTHLY TASKS
# (Recommended to be run once per month, but can be run more or less frequently, based on your local needs
/policies)
#----------------
# Send out "monthly" update subscription e-mails at 02:00, on the first of every month
# (This sends an email to any users who have "subscribed" to a Community/Collection, notifying them of newly
added content.)
# REQUIRED for monthly "Email Subscriptions" to work properly.
0 2 1 * * $DSPACE/bin/dspace subscription-send -f M
# Permanently delete any bitstreams flagged as "deleted" in DSpace, on the first of every month at 01:00
# (This ensures that any files which were deleted from DSpace are actually removed from your local filesystem.
# By default they are just marked as deleted, but are not removed from the filesystem.)
# REQUIRED to fully remove deleted content files from the "assetstore" folder
0 1 1 * * $DSPACE/bin/dspace cleanup > /dev/null
461
462
Search Engine Optimization
Please be aware that individual search engines also have their own guidelines and recommendations for inclusion. While the guidelines below apply to mo
st DSpace sites, you may also wish to review these guidelines for specific search engines:
"Indexing Repositories: Pitfalls and Best Practices" talk from Anurag Acharya (co-creator of Google Scholar) presented at the Open Repositories
2015 conference
Google Scholar Inclusion Guidelines
Bing Webmaster Guidelines
DSpace comes with tools that ensure major search engines (Google, Bing, Yahoo, Google Scholar) are able to easily and effectively index all your
content. However, many of these tools provide some basic setup. Here's how to ensure your site is indexed.
1. Keep your DSpace up to date. We are constantly adding new indexing improvements in new releases
2. Ensure your DSpace is visible to search engines.
3. Ensure your proxy is passing X-Forwarded headers to the User Interface
4. Ensure the user interface is using server-side rendering (enabled by default)
5. Ensure the sitemaps feature is enabled. (enabled by default)
6. Ensure your robots.txt allows access to item "splash" pages and full text.
7. Ensure item metadata appears in HTML headers correctly.
8. Avoid redirecting file downloads to Item landing pages
9. Turn OFF any generation of PDF cover pages
10. As an aside, it's worth noting that OAI-PMH is generally not useful to search engines. OAI-PMH has its own uses, but do not expect search
engines to use it.
Additional minor improvements / bug fixes have been made to more recent releases of DSpace.
If your site is not indexed at all, all search engines have a way to add your URL, e.g.:
Google: http://www.google.com/addurl
Yahoo: http://siteexplorer.search.yahoo.com/submit
Bing: http://www.bing.com/docs/submit.aspx
Because most DSpace sites use some sort of proxy (e.g. Apache web server or Nginx or similar), this requires that the proxy be configured to pass along
proper X-Forwarded-* headers, especially X-Forwarded-Host and X-Forwarded-Proto. For example in Apache HTTPD, you can do something like this:
463
# This lets DSpace know it is running behind HTTPS and what hostname is currently used
# (requires installing/enabling mod_headers)
RequestHeader set X-Forwarded-Proto https
RequestHeader set X-Forwarded-Host my.dspace.edu
Because the DSpace user interface is based on Angular.io (which is a javascript framework), you MUST have server-side rendering enabled (which is the
default) for search engines to fully index your side. Server-side rendering allows your site to still function even when Javascript is turned off in a user's
browser. Some web crawlers do not support Javascript (e.g. Google Scholar), so they will only interact with this server-side rendered content.
If you are unsure if server-side rendering (SSR) is enabled, you can check to see if your site is accessible when Javascript is turned off. For example, in
Chrome, you should be able to do the following:
DSpace use Angular Universal for server-side rendering, and it's enabled by default in Production mode via our production environment initialization in src
/environments/environment.production.ts:
In order to maximum the performance of SSR, by default, DSpace will minimize the pages and Angular components that are processed during server side
rendering. You may wish to review the default settings to ensure they are appropriate for your site. See the "Server Side Rendering (SSR) settings" in Use
r Interface Configuration
You can modify this schedule by using the Cron syntax defined at https://www.quartz-scheduler.org/api/2.3.0/org/quartz/CronTrigger.html . Any
modifications can be placed in your local.cfg.
If you want to disable this automated scheduler, you can either comment it out, or set it to a single "-" (dash) in your local.cfg
464
Again, we highly recommend keeping them enabled. However, you may choose to disable this scheduler if you wish to define these in your local system
cron settings.
Once you've enabled your sitemaps, they will be accessible at the following URLs:
So, for example, if your "dspace.ui.url = https://mysite.org" in your "dspace.cfg" configuration file, then the HTML Sitemaps would be at: "http://mysite.org
/sitemap_index.html"
By default, the Sitemap URLs also will appear in your UI's robots.txt (in order to announce them to search engines):
WARNING: Keep in mind, you do NOT need to run these manually in most situations, as sitemaps are autoupdated on a regular schedule (see
documentation above)
Option meaning
--help
--no_sitemaps
-no_htmlmap
You can configure the list of "all search engines" by setting the value of sitemap.engineurls in dspace.cfg.
The trick here is to minimize load on your server, but without actually blocking anything vital for indexing. Search engines need to be able to index item,
collection and community pages, and all bitstreams within items – full-text access is critically important for effective indexing, e.g. for citation analysis as
well as the usual keyword searching.
If you have restricted content on your site, search engines will not be able to access it; they access all pages as an anonymous user.
Ensure that your robots.txt file is at the top level of your site: i.e. at http://repo.foo.edu/robots.txt, and NOT e.g. http://repo.foo.edu/dspace/robots.txt. If your
DSpace instance is served from e.g. http://repo.foo.edu/dspace/, you'll need to add /dspace to all the paths in the examples below (e.g. /dspace/browse-
subject).
/bitstreams
/browse/* (UNLESS USING SITEMAPS)
/collections
/communities
465
/community-list (UNLESS USING SITEMAPS)
/entities/*
/handle
/items
The highly recommended settings are uncommented. Additional, optional settings are displayed in comments – based on your local configuration you may
wish to enable them by uncommenting the corresponding "Disallow:" line.
##########################
# Default Access Group
# (NOTE: blank lines are not allowable in a group record)
##########################
User-agent: *
# Disable access to Discovery search and filters; admin pages; processes; submission; workspace; workflow &
profile page
Disallow: /search
Disallow: /admin/*
Disallow: /processes
Disallow: /submit
Disallow: /workspaceitems
Disallow: /profile
Disallow: /workflowitems
# NOTE: The default robots.txt also includes a large number of recommended settings to avoid misbehaving bots.
# For brevity, they have been removed from this example, but can be found in src/robots.txt.ejs
WARNING: for your additional disallow statements to be recognized under the User-agent: * group, they cannot be separated by white lines from the
declared user-agent: * block. A white line indicates the start of a new user agent block. Without a leading user-agent declaration on the first line,
blocks are ignored. Comment lines are allowed and will not break the user-agent block.
This is OK:
User-agent: *
# Disable access to Discovery search and filters; admin pages; processes
Disallow: /search
Disallow: /admin/*
Disallow: /processes
This is not OK, as the two lines at the bottom will be completely ignored.
466
User-agent: *
# Disable access to Discovery search and filters; admin pages; processes
Disallow: /search
Disallow: /admin/*
Disallow: /processes
To identify if a specific user agent has access to a particular URL, you can use this handy robots.txt tester.
For more information on the robots.txt format, please see the Google Robots.txt documentation.
If you have heavily customized your metadata fields away from Dublin Core, you can modify the service which generates these elements by modifying https
://github.com/DSpace/dspace-angular/blob/main/src/app/core/metadata/metadata.service.ts
These meta tags are the "Highwire Press tags" which Google Scholar recommends. If you have heavily customized your metadata fields, or wish to
change the default "mappings" to these Highwire Press tags, you may do so by modifying https://github.com/DSpace/dspace-angular/blob/main/src/app
/core/metadata/metadata.service.ts (see for example the "setCitationAuthorTags()" method in that service class)
Much more information is available in the Configuration section on Google Scholar Metadata Mappings.
While these URL redirects may seem harmless, they may be flagged as cloaking or spam by Google, Google Scholar and other major search engines.
This may hurt your site's search engine ranking or even cause your entire site to be flagged for removal from the search engine.
If you have these URL redirects in place, it is highly recommended to remove them immediately. If you created these redirects to facilitate capturing
download statistics in Google Analytics, you should consider upgrading to DSpace 5.0 or above, which is able to automatically record bitstream downloads
in Google Analytics (see https://github.com/DSpace/DSpace/issues/5454) without the need for any URL redirects.
For more information, please see the "Indexing Repositories: Pitfalls and Best Practices" talk from Anurag Acharya (co-creator of Google Scholar)
presented at the Open Repositories 2015 conference.
467
No standard or predictable way to get to item display page or full text from an OAI-PMH record, making effective indexing and presenting
meaningful results difficult.
In most cases provides only access to simple Dublin Core, a subset of available metadata.
NOTE: Back in 2008, Google officially announced they were retiring support for OAI-PMH based Sitemaps. So, OAI-PMH will no longer help you
get better indexing through Google. Instead, you should be using the DSpace 'generate-sitemaps' feature described above.
468
Google Scholar Metadata Mappings
While DSpace 7.0 supports Google Scholar meta tags, they are no longer configurable & are currently hardcoded into the User Interface
codebase. Configurability may be coming back in a later 7.x release (based on user feedback), see https://github.com/DSpace/dspace-angular/issues
/1198
Google Scholar, in crawling sites, prefers Highwire Press tags. This schema contains names which are all prefixed by the string "citation_", and provide
various metadata about the article/item being indexed.
In DSpace, there is a mapping facility to connect metadata fields with these citation fields in HTML. In order to enable this functionality, the switch needs
to be flipped in dspace.cfg:
google-metadata.enable = true
Once the feature is enabled, the mapping is configured by a separate configuration file located here:
[dspace]/config/crosswalks/google-metadata.properties
This file contains name/value pairs linking meta-tags with DSpace metadata fields. E.g…
google.citation_title = dc.title
google.citation_publisher = dc.publisher
google.citation_author = dc.author | dc.contributor.author | dc.creator
There is further documentation in this configuration file explaining proper syntax in specifying which metadata fields to use. If a value is omitted for a meta-
tag field, the meta-tag is simply not included in the HTML output.
The values for each item are interpolated when the item is viewed, and the appropriate meta-tags are included in the HTML head tag, on both the Brief
Item Display and the Full Item Display in the UI.
469
Troubleshooting Information
You can quickly get some basic information about the DSpace version and the products supporting it by using the [dspace]/bin/dspace version
command.
$ bin/dspace version
DSpace version: 4.0-SNAPSHOT
SCM revision: da53991b6b7e9f86c2a7f5292e3c2e9606f9f44c
SCM branch: UNKNOWN
OS: Linux(amd64) version 3.7.10-gentoo
Discovery enabled.
Lucene search enabled.
JRE: Oracle Corporation version 1.7.0_21
Ant version: Apache Ant(TM) version 1.8.4 compiled on June 25 2012
Maven version: 3.0.4
DSpace home: /home/dspace
$
470
Validating CheckSums of Bitstreams
1 Checksum Checker
1.1 Checker Execution Mode
1.2 Checker Results Pruning
1.3 Checker Reporting
1.4 Cron or Automatic Execution of Checksum Checker
1.5 Automated Checksum Checkers' Results
1.6 Database Query
Checksum Checker
Checksum Checker is program that can run to verify the checksum of every item within DSpace. Checksum Checker was designed with the idea that most
System Administrators will run it from the cron. Depending on the size of the repository choose the options wisely.
-p <prune> Prune old results (optionally using specified properties file for configuration
There are three aspects of the Checksum Checker's operation that can be configured:
Unless a particular bitstream or handle is specified, the Checksum Checker will always check bitstreams in order of the least recently checked bitstream.
(Note that this means that the most recently ingested bitstreams will be the last ones checked by the Checksum Checker.)
Limited-count mode: [dspace]/bin/dspace checker -c To check a specific number of bitstreams. The -c option if followed by an integer,
the number of bitstreams to check. Example: [dspace/bin/dspace checker -c 10 This is particularly useful for checking that the checker
is executing properly. The Checksum Checker's default execution mode is to check a single bitstream, as if the option was -c 1
Duration mode: [dspace]/bin/dspace checker -d To run the Check for a specific period of time with a time argument. You may use any
of the time arguments below: Example: [dspace/bin/dspace checker -d 2h(Checker will run for 2 hours)
s Seconds
m Minutes
h Hours
d Days
w Weeks
y Years
471
The checker will keep starting new bitstream checks for the specific durations, so actual execution duration will be slightly longer than the
specified duration. Bear this in mind when scheduling checks.
Specific Bitstream mode: [dspace]/bin/dspace checker -b Checker will only look at the internal bitstream IDs. Example: [dspace]
/bin/dspace checker -b 112 113 4567 Checker will only check bitstream IDs 112, 113 and 4567.
Specific Handle mode: [dspace]/bin/dspace checker -a Checker will only check bitstreams within the Community, Community or the
item itself. Example: [dspace]/bin/dspace checker -a 123456/999 Checker will only check this handle. If it is a Collection or
Community, it will run through the entire Collection or Community.
Looping mode: [dspace]/bin/dspace checker -l or [dspace]/bin/dspace checker -L There are two modes. The lowercase 'el' (-
l) specifies to check every bitstream in the repository once. This is recommended for smaller repositories who are able to loop through all their
content in just a few hours maximum. An uppercase 'L' (-L) specifies to continuously loops through the repository. This is not recommended for
most repository systems. Cron Jobs. For large repositories that cannot be completely checked in a couple of hours, we recommend the -d option
in cron.
Pruning mode: [dspace]/bin/dspace checker -p The Checksum Checker will store the result of every check in the checksum_history
table. By default, successful checksum matches that are eight weeks old or older will be deleted when the -p option is used. (Unsuccessful ones
will be retained indefinitely). Without this option, the retention settings are ignored and the database table may grow rather large!
1. Editing the retention policies in [dspace]/config/dspace.cfg See Chapter 5 Configuration for the property keys. OR
2. Pass in a properties file containing retention policies when using the -p option.To do this, create a file with the following two property keys:
checker.retention.default = 10y
checker.retention.CHECKSUM_MATCH = 8w
You can use the table above for your time units. At the command line: [dspace]/bin/dspace checker -p retention_file_name
<ENTER>
Checker Reporting
Checksum Checker uses log4j to report its results. By default it will report to a log called [dspace]/log/checker.log, and it will report only on
bitstreams for which the newly calculated checksum does not match the stored checksum. To report on all bitstreams checked regardless of outcome, use
the -v (verbose) command line option:
[dspace]/bin/dspace checker -l -v (This will loop through the repository once and report in detail about every bitstream checked.
To change the location of the log, or to modify the prefix used on each line of output, edit the [dspace]/config/templates/log4j.properties file
and run [dspace]/bin/install_configs.
Unix, Linux, or MAC OS. You can schedule it by adding a cron entry similar to the following to the crontab for the user who installed DSpace:
The above cron entry would schedule the checker to run the checker every Sunday at 400 (4:00 a.m.) for 2 hours. It also specifies to 'prune' the database
based on the retention settings in dspace.cfg.
Windows OS. You will be unable to use the checker shell script. Instead, you should use Windows Schedule Tasks to schedule the following command to
run at the appropriate times:
472
Command used: [dspace]/bin/dspace checker-emailer
-d or --Deleted Send E-mail report for all bitstreams set as deleted for today.
-m or --Missing Send E-mail report for all bitstreams not found in assetstore for today.
-c or --Changed Send E-mail report for all bitstreams where checksum has been changed for today.
-n or --Not Processed Send E-mail report for all bitstreams set to longer be processed for today.
-h or --help Help
You can also combine options (e.g. -m -c) for combined reports.
Cron. Follow the same steps above as you would running checker in cron. Change the time but match the regularity. Remember to schedule this after
Checksum Checker has run. For an example cron setup, see Scheduled Tasks via Cron.
Database Query
A query like the following can be used to check the results of the checker (Postgres):
SELECT *
FROM checksum_history
WHERE date_trunc('day', process_start_date) = CURRENT_DATE
AND result != 'CHECKSUM_MATCH'
AND result != 'BITSTREAM_MARKED_DELETED';
SELECT
ch.process_start_date,
ch.process_end_date,
ch.result,
ch.checksum_expected,
ch.checksum_calculated,
b.bitstream_id,
bfr.short_description,
b.store_number,
substring(b.internal_id for 2) || '/' || substring(b.internal_id from 3 for 2) || '/' || substring(b.
internal_id from 5 for 2) || '/' || b.internal_id AS bitstream_path,
hi.handle AS item_handle,
hc.handle AS collection_handle
FROM checksum_history ch
JOIN bitstream b
ON ch.bitstream_id = b.uuid
JOIN bitstreamformatregistry bfr
ON b.bitstream_format_id = bfr.bitstream_format_id
LEFT JOIN bundle2bitstream bb
ON b.uuid = bb.bitstream_id
LEFT JOIN item2bundle ib
ON bb.bundle_id = ib.bundle_id
LEFT JOIN item i
ON ib.item_id = i.uuid
LEFT JOIN handle hi
ON i.uuid = hi.resource_id
AND hi.resource_type_id = 2
LEFT JOIN handle hc
ON i.owning_collection = hc.resource_id
AND hc.resource_type_id = 3
WHERE ch.result != 'CHECKSUM_MATCH'
AND date_trunc('day', process_start_date) = CURRENT_DATE
ORDER BY ch.check_id DESC;
473
474
DSpace Development
This section contains information on how to modify, extend and customize the DSpace source code.
475
User Interface Design Principles & Accessibility
These guidelines help ensure that all DSpace components have a consistent layout and follow the essential Web Content Accessibility Guidelines (WCAG).
These guidelines MUST be followed by anyone who wants to contribute to the project. See also our Code Contribution Guidelines
Overview
Terminology used in this page
Guiding Principles
User Interface Design Guidelines
User Interface Accessibility Guidelines
Overview
These guidelines apply primarily to the "Base Theme" for the DSpace User Interface.
Base Theme (/src/app/ directories): The primary look & feel of DSpace (e.g. HTML layout, header/footer, etc) is defined by the HTML5
templates under this directory. Each HTML5 template is stored in a subdirectory named for the Angular component where that template is used.
The base theme includes very limited styling (CSS, etc), based heavily on default Bootstrap (4.x) styling, and only allowing for minor tweaks to
improve accessibility (e.g. default Bootstrap's color scheme does not have sufficient color contrast)
Two additional themes are provided with DSpace out-of-the-box
Custom Theme (/src/themes/custom directories): This directory acts as the scaffolding or template for creating a new custom
theme. It provides (empty) Angular components/templates which allow you to change the theme of individual components. Since all files
are empty by default, if you enable this theme (without modifying it), it will look identical to the Base Theme.
DSpace Theme (/src/themes/dspace directories): This is the default theme for DSpace 7. It is a very simple example theme
providing a custom color scheme & homepage on top of the Base Theme.
More information on themes (in general) can be found in the User Interface Customization documentation
Guiding Principles
All templates in the Base Theme (/src/app directories) should only use default Bootstrap styling. Documentation at: https://getbootstrap.com/docs
/4.6/getting-started/introduction/
Exceptions may be made for accessibility purposes. For example, Bootstrap notes their default color scheme does not always have
sufficient color contrast
When Bootstrap Components (accordion, dropdown, etc …) are required you MUST use the included ng-bootstrap library. Documentation at: https
://ng-bootstrap.github.io/#/components/accordion/examples
The use of the Bootstrap framework can help in achieving some WCAG goals such as ‘Visual Presentation’ (AAA), 'Parsing' (A), ‘Orientation’ (AA), ‘Reflow’
(AA) and ‘Text Spacing’ (AA). See the Bootstrap chapter ‘Accessibility’ for an explanation of WCAG and where to find additional information.
If that it’s not possible (e.g. a small button with an icon) always use the ‘name’ and ‘title’ properties.
Use the tooltip component when you need a better explanation of a button functionality. For example:
476
For UI elements on public pages that are only visible to users with elevated privileges use an inverted color scheme btn-dark .
For the ‘anchor’ ( the ‘<a>’ element) that uses the ‘btn’ Bootstrap CSS class always use btn-outline-primary .
For the main action button use the Bootstrap CSS class btn-primary .
For buttons like ‘Cancel’ or ‘Back’ use che Bootstrap CSS class btn-outline-secondary .
For buttons that open a dropdown list use the Bootstrap CSS class btn-secondary .
In a button series, or group, only one ‘<button>’ has the Bootstrap CSS class btn-primary .
The buttons order, inside a group or a series, should follow the kind of action it performs. At the left side the most ‘light’ action, like ‘Back’ or
‘Cancel’, at the right side the most ‘changing’ action like ‘Delete’ or ‘Remove’. So, the order is:
Here an example:
To inform the user that a certain section of the page is about to be modified as a result of an action (e.g. a page change to move forward in a list)
it's necessary to make an animated waiting icon appear:
All searches that do not return a result must report their absence via a message within a block (e.g. ‘<div>’) with the Bootstrap CSS class 'alert-
info'.
The items in the horizontal top navigation menu are links to pages always available to the user (logged or not).
The items in the left vertical menu are links concerning management, administration and creation or modification of DSpace items. The list can be
different according to the permissions of the logged user.
If a page topic has more logical subdivisions, it's opportune to separate them in more tabs (e.g. the 'Edit collection' page where you can edit
metadata, roles and policies).
Inside the Extended Footer you can insert information about the institution, partnerships, social links, external links or legal information.
Notifications
477
All the notifications with the whole page scope should be placed on the top right side of the page, hovering on the elements beneath it:
Notifications types and color schemes: the corresponding Bootstrap CSS class must be applied:
Besides warnings about changes in behaviors, all the notifications must have a closing button to remove them.
The timed notifications are allowed when the notification is purely informational and there is no possibility of interaction (e.g. presence of
buttons or forms within the notification).
For the most common actions and alerts, the following free FontAwesome icons are recommended:
<html lang="en">
...
</html>
478
Other regions of a page can be the header, footer, the page content, etc. Remember that ‘aria-label’ should be used only when there is no other
element in the HTML page that can describe better the element itself. In that case use the ‘aria-labelledby’:
It can be also used on a simple text field to provide a label in a situation where there is no text available for a dedicated label but there is other
text on the page that can be used to accurately label the control. Ex.:
It’s possible to use them to provide labels to user interface controls (ex.: buttons or inputs in a form).
For all User Interface components, ‘name’ and ‘role’ must be determined programmatically. To do this use:
Plain text with a full description where possible. This will help people with cognitive disabilities who may not immediately know the
purpose of the field because the label used by the author is not familiar to them;
Label element;
ARIA label and ARIA labelledby.
Identify programmatically the purpose of the inputs using the guidelines described above and the attribute ‘autocomplete’:
This property is useful to browsers / user agents to identify the content and provide auto-fill capabilities. The values you can use with
'autocomplete' are described here:
https://www.w3.org/TR/WCAG21/#input-purposes
Including the text of the visible label as part of the accessible name. When speech recognition software processes speech input and looks for
matches, it uses the ‘accessible name' of controls, so it’s important that what the user reads in label or description is, at least partially, what is
defined in the ‘accessible name' like ‘aria-label’ or ‘aria-labelledby’. E.g. if a button has a visible value of ‘search’ and its ‘aria-label’ has ‘go’ a
problem can occur when the user says 'click Search’ :
<button aria-label="Go">Search</button>
So, if you have an ‘accessible name' available, you can expand it using the label text inside it. All of the following examples are valid:
<button>Search</button>
<button aria-label="Search for matches"><i class="fa fa-search"></i></button>
Order of focus: For example, in a form, use ‘tabindex’ logically (e.g. street number after street name).
Change the color of an element when it receives FOCUS: e.g. CSS can be used to apply a different color when link elements receive focus.
Ensure that the information conveyed by color differences is also available in the text; e.g. links also underlined or mandatory form fields
highlighted with an asterisk (*);
Error identification: The element in error is identified and described by text even with client-side controls. Use the property 'aria-invalid=”true”'
inside that element. For example, within a form, apply client-side validations to the input fields and make sure that any error message is
comprehensive; where possible, suggestions on how to correct the error, should be provided to the user.
Provide users with sufficient time to read and use the content. For example, inside the timed notifications (error or success messages) provide a
button to stop the timer.
479
REST API
Overview
REST Contract / Documentation
Finding which REST API Endpoint to use
REST Configuration
REST Spring Boot Configuration
Technical Design
DSpace Demo REST-API HAL Browser
DSpace Python REST Client Library
Overview
The REST API for DSpace is provided as part of the "server" webapp ([dspace-source]/dspace-server-webapp/). It is available on the `/api/`
subpath of that webapp (i.e. ${dspace.server.url}/api/), though a human browseable/searchable interface (using the HAL Browser) is also
available at the root path (i.e. ${dspace.server.url}).
This contract provides detailed information on how to interact with the API, what endpoints are available, etc. All features/capabilities of the DSpace UI are
available in this API.
First, it's important to be aware that every single action in the User Interface can be done in the REST API. So, if you can achieve something in the
User Interface, then it's also possible to do via the REST API.
Authentication: https://github.com/DSpace/RestContract/blob/dspace-7_x/authentication.md
CSRF Tokens (required for all non-GET requests): https://github.com/DSpace/RestContract/blob/dspace-7_x/csrf-tokens.md
Submission via REST API: https://github.com/DSpace/RestContract/blob/dspace-7_x/submission.md
Search via REST API (across all object types): https://github.com/DSpace/RestContract/blob/dspace-7_x/search-endpoint.md
Some endpoints also provide a "/search" subpath: https://github.com/DSpace/RestContract/blob/dspace-7_x/search-rels.md
1. Open the DSpace User Interface in your browser window. You can even use our Demo Site (https://demo.dspace.org/) if you don't have the User
Interface installed or running locally.
2. In your Browser, open the "Developer Tools"
a. In Chrome, go to "More Tools Developer Tools"
b. In Firefox, go to "Web Developer Web Developer Tools".
c. In Microsoft Edge, go to "More Tools Developer Tools".
3. Once in "Developer Tools", open the "Network" tab. This tab will provide information about every single call that the User Interface makes to the
REST API.
4. Now, perform an action or use a feature in the User Interface in your browser window.
5. Analyze what calls were just sent to the REST API in your "Network" tab. Those are the exact REST API endpoints that were used to perform
that action.
a. NOTE: Some actions may use multiple endpoints.
6. Finally, lookup the documentation for those endpoint(s) in the REST Contract / Documentation (see link above)
REST Configuration
The following REST API configurations are provided in [dspace]/config/rest.cfg and may be overridden in your local.cfg
Pr rest.cors.allowed-origins
op
ert
y:
Ex rest.cors.allowed-origins = ${dspace.ui.url}
am
ple
Val
ue:
480
Inf Allowed Cross-Origin-Resource-Sharing (CORS) origins (in "Access-Control-Allow-Origin" header). Only these origins (client URLs) can
or successfully authenticate with your REST API via a web browser. Defaults to ${dspace.ui.url} if unspecified (as the UI must have access to
ma the REST API). If you customize that setting, MAKE SURE TO include ${dspace.ui.url} in that setting if you wish to continue trusting the UI.
tio
nal Multiple allowed origin URLs may be comma separated (or this configuration can be defined multiple times). Wildcard value (*) is NOT
No SUPPORTED.
te:
Keep in mind any URLs added to this setting must be an exact match with the origin: mode (http vs https), domain, port, and subpath(s) all must
match.. So, for example, these URLs are all considered different origins: "http://mydspace.edu", "http://mydspace.edu:4000" (different port), "https:/
/mydspace.edu" (http vs https), "https://myapp.mydspace.edu" (different domain), and "https://mydspace.edu/myapp" (different subpath).
NOTE #1: Development or command-line tools may not use CORS and may therefore bypass this configuration. CORS does not provide
protection to the REST API / server webapp. Instead, its role is to protect browser-based clients from cookie stealing or other Javascript-based
attacks. All modern web browsers use CORS to protect their users from such attacks. Therefore DSpace's CORS support is used to protect users
who access the REST API via a web browser application, such as the DSpace UI or custom built Javascript tools/scripts.
NOTE #2: If you modify this value to allow additional UIs (or Javascript tools) to access your REST API, then you may also need to modify proxie
s.trusted.ipranges to trust the IP address of each UI. Modifying trusted proxies is only necessary if the X-FORWARDED-FOR header must be
trusted from each additional UIs. (The DSpace UI currently requires the X-FORWARDED-FOR header to be trusted). By default, proxies.
trusted.ipranges will only trust the IP address of the ${dspace.ui.url} configuration.
NOTE #3: Although subpath must match, the Origin header itself sent from Angular will never contain a subpath. So if the dspace.ui.url config
property ever changes to include a subpath like /myapp, then the expected origin will need to be added to rest.cors.allowed-origins , ie.
the URL without the subpath.
Pr rest.cors.allow-credentials
op
ert
y:
Ex rest.cors.allow-credentials = true
am
ple
Val
ue:
Inf Whether or not to allow credentials (e.g. cookies) sent by the client/browser in CORS requests (in "Access-Control-Allow-Credentials" header).
or
ma For DSpace, this MUST be set to "true" to support CSRF checks (which use Cookies) and external authentication via Shibboleth (and similar).
tio Defaults to "true" if unspecified. (Requires reboot of servlet container, e.g. Tomcat, to reload)
nal
No
te:
Pr rest.projections.full.max
op
ert
y:
Ex rest.projections.full.max = 2
am
ple
Val
ue:
Inf This property determines the max embeddepth for a FullProjection. This is also used by the SpecificLevelProjection
or as a fallback in case the property is defined on the bean. Usually, this should be kept as-is for best performance.
ma
tio
nal
No
te:
Pr rest.projection.specificLevel.maxEmbed
op
ert
y:
Ex rest.projection.specificLevel.maxEmbed = 5
am
ple
Val
ue:
481
Inf This property determines the max embed depth for a SpecificLevelProjection. Usually, this should be kept as-is for best performance.
or
ma
tio
nal
No
te:
Pr rest.properties.exposed
op
ert
y:
Ex rest.properties.exposed = plugin.named.org.dspace.curate.CurationTask
am rest.properties.exposed = google.analytics.key
ple
Val
ue:
Inf Define which configuration properties are exposed through the http://<dspace.server.url>/api/config/properties/ REST API
or endpoint.
ma
tio If a rest request is made for a property which exists, but isn't listed here, the server will respond that the property wasn't found. This property can
nal be defined multiple times to allow access to multiple configuration properties.
No
te: Generally, speaking, it is ONLY recommended to expose configuration settings where they are necessary for the UI or client, as exposing too
many configurations could be a security issue. This is why we only expose the two above settings by default.
Propert spring.servlet.multipart.max-file-size
y:
Informa Per Spring Boot docs, this setting specifies the maximum size of file that can be uploaded via Spring Boot (and therefore via the DSpace
tional REST API). A value of "-1" removes any limit. DSpace sets this to 512MB by default.
Note:
Propert spring.servlet.multipart.max-request-size
y:
Informa Per Spring Boot docs, this setting specifies the maximum size of a single request via Spring Boot (and therefore via the DSpace REST
tional API). That means if multiple files are uploaded at once, this is the maximum total size of all files. A value of "-1" removes any limit. DSpace
Note: sets this to 512MB by default.
Technical Design
The REST API & Server Webapp are built on Spring Boot and Spring HATEOAS, using Spring Security. It also aligns with Spring Data REST (though at
this time it doesn't use it directly because of incompatibility with the DSpace data model).
The REST API is stateless, aligns with HATEOAS (Hypertext as the Engine of Application State) principles, returning HAL formatted JSON. This allows
the REST API to be easily browsable/interactable via third-party tools that understand HAL & HATEOAS, such as the HAL Browser. JSON Web Tokens
(JWT) are used to store state/session information between requests.
For better security, the REST API requires usage of CSRF tokens for all modifying requests.
483
REST API v6 (deprecated)
What is DSpace REST API (v4-v6)
Installing the REST API (v4-v6)
Disabling SSL
REST Endpoints
Index / Authentication
Shibboleth Apache configuration for the REST API
Communities
Collections
Items
Bitstreams
Handle
Hierarchy
Schema and Metadata Field Registry
Report Tools
Model - Object data types
Introduction to Jersey for developers
Configuration for DSpace REST
Recording Proxy Access by Tools
Additional Information
This documentation describes the deprecated DSpace v4-6 REST API. This old API is still available in DSpace 7, but will be removed in DSpace 8.
We highly recommend all users migrate scripts/tools to use the new REST API. This API is no longer actively supported or maintained.
DSpace 4 introduced the initial REST API, which did not allow for authentication, and provided only READ-ONLY access to publicly accessible
Communities, Collections, Items, and Bitstreams. DSpace 5 builds off of this and allows authentication to access restricted content, as well as allowing
Create, Edit and Delete on the DSpace Objects. DSpace 5 REST API also provides improved pagination over resources and searching. There has been a
minor drift between the DSpace 4 REST API and the DSpace 5 REST API, so client applications will need to be targeted per version.
# The "-Pdspace-rest" flag will build the deprecated "rest" webapp alongside the new "server" webapp
mvn clean package -Pdspace-rest
The REST API deploys as a separate "rest" webapp for your servlet container / tomcat. For example, depending on how you deploy webapps, one way
would be to alter tomcat-home/conf/server.xml and add:
In DSpace 4, the initial/official Jersey-based REST API was added to DSpace. The DSpace 4 REST API provides READ-ONLY access to DSpace Objects.
In DSpace 5, the REST API adds authentication, allows Creation, Update, and Delete to objects, can access restricted materials if authorized, and it
requires SSL.
Disabling SSL
For localhost development purposes, SSL can add additional getting-started difficulty, so security can be disabled. To disable DSpace REST's requirement
to require security/ssl, alter [dspace]/webapps/rest/WEB-INF/web.xml or [dspace-source]/dspace-rest/src/main/webapp/WEB-INF
/web.xml and comment out the <security-constraint> block, and restart your servlet container. Production usages of the REST API should use
SSL, as authentication credentials should not go over the internet unencrypted.
REST Endpoints
The REST API is modeled after the DSpace Objects of Communities, Collections, Items, and Bitstreams. The API is not a straight database schema dump
of these entities, but provides some wrapping that makes it easy to follow relationships in the API output.
484
HTTP Header: Accept
Note: You must set your request header's "Accept" property to either JSON (application/json) or XML (application/xml) depending on the format you prefer
to work with.
Example usage from command line in XML format with pretty printing:
Example usage from command line in JSON format with pretty printing:
For this documentation, we will assume that the URL to the "REST" webapp will be http://localhost:8080/rest/ for
production systems, this address will be slightly different, such as: https://demo.dspace.org/rest/. The path to an
endpoint, will go after the /rest/, such as /rest/communities, all-together this is: http://localhost:8080/rest/communities
Another thing to note is that there are Query Parameters that you can tack on to the end of an endpoint to do extra
things. The most commonly used one in this API is "?expand". Instead of every API call defaulting to giving you
every possible piece of information about it, it only gives a most commonly used set by default and gives the more
"expensive" information when you deliberately request it. Each endpoint will provide a list of available expands in
the output, but for getting started, you can start with ?expand=all, to make the endpoint provide all of its information
(parent objects, metadata, child objects). You can include multiple expands, such as: ?expand=collections,
subCommunities .
Two other query parameters of note are limit and offset. Endpoints which return arrays of objects, such as /communities, are
"paginated": the full list is broken into "pages" which start at offset from the beginning of the list and contain at most limit elements. By repeated
queries you can retrieve any portion of the array or all of it. Offsets begin at zero. So, to retrieve the sixth through tenth elements of the full list of
Collections, you could do this:
Index / Authentication
REST API Authentication has changed in DSpace 6.x. It now uses a JSESSIONID cookie (see below). The previous (5.x) authentication scheme using a
rest-dspace-token is no longer supported.
485
POST /login Login to the REST API using a DSpace EPerson (user). It returns a JSESSIONID cookie, that can be used for future
authenticated requests.
Example Request:
# Can use either POST or GET (POST recommended). Must pass the parameters "email" and
"password".
curl -v -X POST --data "[email protected]&password=mypass" https://dspace.myu.edu
/rest/login
Example Response:
HTTP/1.1 200 OK
Set-Cookie: JSESSIONID=6B98CF8648BCE57DCD99689FE77CB1B8; Path=/rest/; Secure; HttpOnly
GET /shibboleth- Login to the REST API using Shibboleth authentication. In order to work, this requires additional Apache configuration. To
login authenticate, execute the following steps:
2. This should take you again to the IdP login page. You can submit this form using curl using the same cookie jar. However
this is IdP dependant so we cannot provide an example here.
3. Once you submit the form using curl, you should be taken back to the /rest/shibboleth-login URL which will return you the
JSESSIONID.
POST /logout Logout from the REST API, by providing a JSESSIONID cookie. After being posted this cookie will no longer work.
Example Request:
After posting a logout request, cookie is invalidated and the "/status" path should show you as unauthenticated (even when
passing that same cookie). For example:
486
GET /test Returns string "REST api is running", for testing that the API is up.
Example Request:
curl https://dspace.myu.edu/rest/test
Example Response:
GET /status Receive information about the currently authenticated user token, or the API itself (e.g. version information).
{
"okay":true,
"authenticated":true,
"email":"[email protected]",
"fullname":"DSpace Administrator",
"sourceVersion":"6.0",
"apiVersion":"6"
}
<Location "/rest/shibboleth-login">
AuthType shibboleth
ShibRequireSession On
# Please note that setting ShibUseHeaders to "On" is a potential security risk.
# You may wish to set it to "Off". See the mod_shib docs for details about this setting:
# https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPApacheConfig#NativeSPApacheConfig-
AuthConfigOptions
# Here's a good guide to configuring Apache + Tomcat when this setting is "Off":
# https://www.switch.ch/de/aai/support/serviceproviders/sp-access-rules.html#javaapplications
ShibUseHeaders On
require valid-user
</Location>
487
2.
a.
b. This should take you again to the IdP login page. You can submit this form using curl using the same cookie jar. However this is IdP
dependant so I cannot provide an example here.
c. Once you submit the form using curl, you should be taken back to the /rest/shibboleth-login URL which will return you the
JSESSIONID.
d. Using that JSESSIONID, check if you have authenticated successfully:
c. This should give you information if the Shibboleth session is valid and on the number of attributes.
d. Use this cookie to obtain a Tomcat JSESSIONID:
Communities
Communities in DSpace are used for organization and hierarchy, and are containers that hold sub-Communities and Collections. (ex: Department of
Engineering)
Collections
Collections in DSpace are containers of Items. (ex: Engineering Faculty Publications)
488
POST /collections/find-collection - Find collection by passed name.
PUT /collections/{collectionId} - Update collection. You must put Collection.
DELETE /collections/{collectionId} - Delete collection from DSpace.
DELETE /collections/{collectionId}/items/{itemId} - Delete item in collection.
Items
Items in DSpace represent a "work" and combine metadata and files, known as Bitstreams.
Bitstreams
Bitstreams are files. They have a filename, size (in bytes), and a file format. Typically in DSpace, the Bitstream will the "full text" article, or some other
media. Some files are the actual file that was uploaded (tagged with bundleName:ORIGINAL), others are DSpace-generated files that are derivatives or
renditions, such as text-extraction, or thumbnails. You can download files/bitstreams. DSpace doesn't really limit the type of files that it takes in, so this
could be PDF, JPG, audio, video, zip, or other. Also, the logo for a Collection or a Community, is also a Bitstream.
You can access the parent object of a Bitstream (normally an Item, but possibly a Collection or Community when it is its logo) through: /bitstreams/:
bitstreamID?expand=parent
As the documentation may state "You must post a ResourcePolicy" or some other object type, this means that there is a structure of data types, that your
XML or JSON must be of type, when it is posted in the body.
Handle
In DSpace, Communities, Collections, and Items typically get minted a Handle Identifier. You can reference these objects in the REST API by their handle,
as opposed to having to use the internal item-ID.
GET /handle/{handle-prefix}/{handle-suffix} - Returns a Community, Collection, or Item object that matches that handle.
Hierarchy
Assembling a full representation of the community and collection hierarchy using the communities and collections endpoints can be inefficient. Retrieve a
lightweight representation of the nested community and collection hierarchy. Each node of the hierarchy contains minimal information (id, handle, name).
GET /hierarchy - Retrieve a lightweight representation of the nested community and collection hierarchy.
GET /registries/schema/{schema_prefix}/metadata-fields/{element} - Returns the metadata field within a schema with an unqualified element name
GET /registries/schema/{schema_prefix}/metadata-fields/{element}/{qualifier} - Returns the metadata field within a schema with a qualified
element name
489
PUT /registries/metadata-fields/{field_id} - Update the specified metadata field
DELETE /registries/metadata-fields/{field_id} - Delete the specified metadata field from the metadata field registry
DELETE /registries/schema/{schema_id} - Delete the specified schema from the schema registry
Note: since the schema object contains no data fields, the following method has not been implemented: PUT /registries/schema/{schema_id}
Report Tools
Reporting Tools that allow a repository manager to audit a collection for metadata consistency and bitstream consistency. See REST Based Quality
Control Reports for more information.
GET /reports - Return a list of report tools built on the rest api
GET /filters - Return a list of use case filters available for quality control reporting
GET /filtered-collections - Return collections and item counts based on pre-defined filters
GET /filtered-collections/{collection_id} - Return items and item counts for a collection based on pre-defined filters
GET /filtered-items - Retrieve a set of items based on a metadata query and a set of filters
Community Object
{"id":456,"name":"Reports Community","handle":"10766/10213","type":"community","link":"/rest/communities/456","expand":["parentCommunity","
collections","subCommunities","logo","all"],"logo":null,"parentCommunity":null,"copyrightText":"","introductoryText":"","shortDescription":"Collection contains
materials pertaining to the Able Family","sidebarText":"","countItems":3,"subcommunities":[],"collections":[]}
Collection Object
Item Object
Bitstream Object
ResourcePolicy Object
[{"id":317127,"action":"READ","epersonId":-1,"groupId":0,"resourceId":47166,"resourceType":"bitstream","
rpDescription":null,"rpName":null,"rpType":"TYPE_INHERITED","startDate":null,"endDate":null}]
MetadataEntry Object
User Object
{"email":"[email protected]","password":"pass"}
Status Object
490
Introduction to Jersey for developers
The REST API for DSpace is implemented using Jersey, the reference implementation of the Java standard for building RESTful Web Services (JAX-RS
1). That means this API should be easier to expand and maintain than other API approaches, as this approach has been widely adopted in the industry. If
this client documentation does not fully answer about how an endpoint works, it is helpful to look directly at the Java REST API code, to see how it is
implemented. The code typically has required parameters, optional parameters, and indicates the type of data that will be responded.
There was no central ProviderRegistry that you have to declare your path. Instead, the code is driven by annotations, here is a list of annotations used in
the code for CommunitiesResource.java:
@Path("/communities"), which then allows it to be routed to http://localhost:8080/communities, this is then the base path for all the requests within
this class.
@GET, which indicates that this method responds to GET http requests
@POST, which indicates that this method responds to POST http requests
@PUT, which indicates that this method responds to PUT http requests
@DELETE, which indicates that this method responds to DELETE http requests
@Path("/{community_id}"), the path is appended to the class level @Path above, this one uses a variable {community_id}. The total endpoint
would be http://localhost:8080/rest/communities/123, where 123 is the ID.
@Consumes({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML }), this indicates that this request expects input of either JSON
or XML. Another endpoint accepts HTML input.
@PathParam("community_id") Integer communityId, this maps the path placeholder variable {community_id} to Java int communityID
@QueryParam("userIP") String user_ip, this maps a query param like ?userIP=8.8.4.4 to Java String user_id variable, and user_id == "8.8.4.4"
Informational Note Boolean value indicates whether statistics should be recorded for access via the REST API; Defaults to 'false'.
http://localhost:8080/rest/items/:ID?userIP=ip&userAgent=userAgent&xforwardedfor=xforwardedfor
If no parameters are given, the details of the HTTP request's sender are used in statistics. This enables tools to record the details of their user
rather than themselves.
Additional Information
Additional information can be found in the README for dspace-rest, and in the GitHub Pull Request for DSpace REST (Jersey).
491
REST Based Quality Control Reports
Tutorial
Summary
API Calls Used in these Reports
Report Screen Shots
Collection QC Report
Metadata Query Report
Installation and Configuration
Installing in DSpace 6
Disabling the REST Reports
Configuring Access of the Reporting Tools
Configure the REST Reports that can be requested by name
Configure Item handle resolution
Enable User Authentication (Password AuthN only) for REST reports
Configure the database-specific format for a regex expression
Configure the sets of filters of interest to your repository managers
Other filter configuration settings
Enabling Sort-able Report Tables
Installing in DSpace 5
DSpace 7.0 only supports this when using the older, deprecated REST API v6
In DSpace 7.0, REST Quality Control Reports are currently only supported if you also install the old REST API v6 (deprecated) webapp. Tentative plans to
migrate these reports to support the new REST API have begun in https://github.com/DSpace/DSpace/issues/7641
Tutorial
DSpace REST Report Tool Tutorial
The following repository contains a tutorial demonstrating the usage of the REST Base Report Tools: https://github.com/terrywbrady/restReportTutorial/blob
/master/README.md
Summary
These reports utilize the DSpace REST API to provide a Collection Manager with
When deploying the DSpace REST API, and institution may choose to make the API publicly accessible or to restrict access to the API.
If these reports are deployed in a protected manner, the reporting tools can be configured to bypass DSpace authorization when reporting on collections
and items.
Collection QC Report
492
REST Reports - Collection Report Screenshots with Annotated API Calls
Installing in DSpace 6
This code is part of the DSpace 6 code base.
493
Enable/disable report resources in the REST API
<servlet-mapping>
<servlet-name>default</servlet-name>
<url-pattern>/static/*</url-pattern>
</servlet-mapping>
Bypassing authorization checks allows collection owners to view the status of all items in the repository without authenticating through the REST API. This
option is recommended if you have secured access to your REST API.
If your REST API is publicly accessible, deploy the reports with anonymous access and consider providing an authorization token for access to the report
calls.
this.ROOTPATH = "/handle/"
494
Enable/Disable Password AuthN
# The REST Report Tools may pass a regular expression test to the database.
# The following configuration setting will construct a SQL regular expression test appropriate to your database
engine
rest.regex-clause = text_value ~ ?
# A filter contains a set of tests that will be applied to an item to determine its inclusion in a particular
report.
# Private items and withdrawn items are frequently excluded from DSpace reports.
# Additional filters can be configured to examine other item properties.
# For instance, items containing an image bitstream often have different requirements from a item containing a
PDF.
# The DSpace REST reports come with a variety of filters that examine item properties, item bitstream
properties,
# and item authorization policies. The existing filters can be used as an example to construct institution
specific filters
# that will test conformity to a set of institutional policies.
# plugin.sequence.org.dspace.rest.filter points to a list of classes that contain available filters.
# Each class must implement the ItemFilterList interface.
# ItemFilterDefs: Filters that examine simple item and bitstream type properties
# ItemFilterDefsMisc: Filters that examine bitstream mime types and dependencies between bitstreams
# ItemFilterDefsMeta: Filters that examine metadata properties
# ItemFilterDefsPerm: Filters that examine item and bitstream authorization policies
plugin.sequence.org.dspace.rest.filter.ItemFilterList = \
org.dspace.rest.filter.ItemFilterDefs,\
org.dspace.rest.filter.ItemFilterDefsMisc,\
org.dspace.rest.filter.ItemFilterDefsPerm
# org.dspace.rest.filter.ItemFilterDefsMeta,\
495
CHANGE
CHANGE TO
Installing in DSpace 5
This feature is not a part of the DSpace 5 code base. Please see the following notes to enable a DSpace 5 compatible version of these reports.
1. Install https://github.com/DSpace/DSpace/pull/1568
2. Change the following code into restCollReport.js and restQuery.js to pull the correct id for each DSpace Object
Change TO
496
REST Reports - Collection Report Screenshots with
Annotated API Calls
DSpace 7.0 only supports this when using the older, deprecated REST API v6
In DSpace 7.0, REST Quality Control Reports are currently only supported if you also install the old REST API v6 (deprecated) webapp. Tentative plans to
migrate these reports to support the new REST API have begun in https://jira.lyrasis.org/browse/DS-4301.
/rest/filtered-collections?limit=25&expand=topCommunity&offset=0
/rest/filters
497
View Filtered Counts
API Call
/rest/filtered-collections/{collection_id}?limit=500&filters=has_multiple_originals.has_one_original
498
View Items of Interest
API Call
/rest/filtered-collections/{collection_id}?expand=items&limit=100&filters=has_one_original&offset=0
/rest/registries/schema
/rest/filtered-collections/{collection_id}?expand=items,metadata&limit=100&filters=has_one_original&offset=0&show_fields[]=dc.date.created&show_fields[]
=dc.date.issued
500
Download CSV File for Metadata Update
501
REST Reports - Metadata Query Screenshots with
Annotated API Calls
DSpace 7.0 only supports this when using the older, deprecated REST API v6
In DSpace 7.0, REST Quality Control Reports are currently only supported if you also install the old REST API v6 (deprecated) webapp. Tentative plans to
migrate these reports to support the new REST API have begun in https://jira.lyrasis.org/browse/DS-4301.
/rest/hierarchy
502
Multiple Metadata Fields can be Queried
API Call
/rest/registries/schema
API Call
/rest/filters
503
Select Additional Fields to Display
API Call
/rest/registries/schema
View Results
API Call
rest/filtered-items?
query_field[]=dc.subject.*&query_field[]=dc.creator&query_op[]=contains&query_op[]=matches&query_val[]
=politic&query_val[]=.*Krogh.*
&collSel[]=
&limit=100&offset=0
&expand=parentCollection,metadata
&filters=is_withdrawn,is_discoverable
&show_fields[]=dc.subject&show_fields[]=dc.subject.other
504
Export as CSV for DSpace Metadata Update Process
505
REST Reports - Summary of API Calls
GET /rest - Summary of API Calls
GET /rest/reports - List of Available Reports
GET /rest/reports/[report name] - Redirect to a Specific Report
GET /rest/filters - Return filters to apply to a list of items
GET /rest/filtered-collections - Return collections and item counts based on pre-defined filters
GET /rest/filtered-collections/{collection_id} - Return items and item counts for a collection based on pre-defined filters
GET /rest/filtered-items - Retrieve a set of items based on a metadata query and a set of filters
DSpace 7.x only supports this when using the older, deprecated REST API v6
In DSpace 7.0, REST Quality Control Reports are currently only supported if you also install the old REST API v6 (deprecated) webapp. Tentative plans to
migrate these reports to support the new REST API have begun in https://github.com/DSpace/DSpace/issues/7641
GET /reports - Return a list of report tools built on the rest api
GET /filters - Return a list of use case filters available for quality control reporting
GET /filtered-collections - Return collections and item counts based on pre-defined filters
GET /filtered-collections/{collection_id} - Return items and item counts for a collection based on pre-defined filters
GET /filtered-items - Retrieve a set of items based on a metadata query and a set of filters
collection: /rest/static/index.html
item-query: /rest/static/query.html
506
GET /rest/filtered-collections - Return collections and item counts based on pre-defined filters
This request is similar to the call /rest/collections except that it allows the user to supply a comma separated list of filters to apply to the collection.
GET /rest/filtered-collections/{collection_id} - Return items and item counts for a collection based
on pre-defined filters
This request is similar to the call /rest/collections/{collection_id} except that it allows the user to supply a comma separated list of filters to apply to the
collection.
When combined with the expand=items parameter, this call will return the set of items that match a filter or set of filters. It may be necessary to paginate
through these results.
GET /rest/filtered-items - Retrieve a set of items based on a metadata query and a set of filters
This request allows a collection owner to construct a complex metadata query against specific metadata fields applying a number of comparison operators.
The search features of DSpace allow an end user to discover items via search. This command allows the collection owner to audit and enforce metadata
consistency within a collection.
507
Field(s) to be searched
Search operator
Search value (if applicable)
Collection scope (optional)
Comma separated list of collections to search
Filters
A comma separated list of item filters that will be applied to all results.
It may be necessary to paginate through results when applying a highly selective filter
508
Advanced Customisation
If you are looking for ways to override specific classes or resources in DSpace (specifically in the backend), this page provides a guide for how to do so.
1 Additions module
2 Server Webapp Overlay
3 Rest (Deprecated) Webapp Overlay
Additions module
Location: [dspace-source]/dspace/modules/additions/
This module may be used to store dspace-api changes, custom plugins, etc. Classes placed in [dspace-source]/dspace/modules/additions will
override those located in the [dspace-source]/dspace-api
This module may be used to override classes across all webapps located in [dspace-source]/dspace/modules/ directory, as well as in the
command line interface. Therefore, this modules is for global overrides only. If you have overrides specific to a single webapp, use the "Maven WAR
Overlays" option below.
This module overlay directory allows you to override any classes, resources or files available (by default) in the Server Webapp. This includes overriding
files of any of the following source directories:
Java classes place in [dspace-source]/dspace/modules/server/ will override classes (of the same path/name) in any of the above modules.
You can also override resources (i.e. any files under a /src/main/resources/ directory) which are embedded in one of the JARs by putting them under [dsp
ace-source]/dspace/modules/server/src/main/resources/. For example, to override the "[dspace-source]/dspace-oai/src/main
/resources/templates/index.twig.html" file embedded in the dspace-oai.jar, you'd place your own version at [dspace-source]/dspace
/modules/server/src/main/resources/templates/index.twig.html. This results in the resource/file being copied over into the WEB-INF
/classes/ subdirectory of the "server" webapp, and in that location it will override any file of the same name embedded in a JAR (per Servlet Spec 3.0).
If you have chosen to install the deprecated REST API v6 webapp, you can similar override any classes/files of that separate webapp by just placing those
files in the [dspace-source]/dspace/modules/rest/ directory
509
DSpace Service Manager
1 Introduction
2 Configuration
2.1 Configuring Addons to Support Spring Services
2.2 Configuration Priorities
2.2.1 Configuring a new Addon
2.2.1.1 Addon located as resource in jar
2.2.1.2 Addon located in the [dspace]/config/spring directory
2.2.2 The Core Spring Configuration
2.2.3 Utilizing Autowiring to minimize configuration complexity.
2.3 Accessing the Services Via Service Locator / Java Code
3 Architectural Overview
3.1 Service Manager Startup in Webapplications and CLI
4 Tutorials
Introduction
The DSpace Spring Service Manager supports overriding configuration at many levels.
Configuration
This latter method requires the addon to implement a SpringLoader to identify the location to look for Spring configuration and a place configuration files
into that location. This can be seen inside the current [dspace-source]/config/modules/spring.cfg
Configuration Priorities
The ordering of the loading of Spring configuration is the following:
api: when placed in this module the Spring files will always be processed into services (since all of the DSpace modules are dependent on the
API).
discovery: when placed in this module the Spring files will only be processed when the discovery library is present
The reason why there is a separate directory is that if a service cannot be loaded, the kernel will crash and DSpace will not start.
So you need to indeed create a new directory in [dspace]/config/spring. Next you need to create a class that inherits from the "org.dspace.kernel.config.
SpringLoader". This class only contains one method named getResourcePaths(). What we do now at the moment is implement this in the following manner:
510
@Override
public String[] getResourcePaths(ConfigurationService configurationService) {
StringBuffer filePath = new StringBuffer();
filePath.append(configurationService.getProperty("dspace.dir"));
filePath.append(File.separator);
filePath.append("config");
filePath.append(File.separator);
filePath.append("spring");
filePath.append(File.separator);
filePath.append("{module.name}"); //Fill in the module name in this string
filePath.append(File.separator);
try {
//By adding the XML_SUFFIX here it doesn't matter if there should be some kind of spring.xml.old file
in there it will only load in the active ones.
return new String[]{new File(filePath.toString()).toURI().toURL().toString() + XML_SUFFIX};
} catch (MalformedURLException e) {
return new String[0];
}
}
After the class has been created you will also need to add it to the "spring.springloader.modules" property located in the [dspace]/config/modules/spring.
cfg.
The Spring service manager will check this property to ensure that only the interface implementations which it can find the class for are loaded in.
By doing this way we give some flexibility to the developers so that they can always create their own Spring modules and then Spring will not crash when it
can't find a certain class.
Architectural Overview
Please see Architectural Overview here: DSpace Services Framework
Tutorials
Several good Spring / DSpace Services Tutorials are already available:
511
Curation Tasks
1 Writing your own tasks
2 Task Output and Reporting
2.1 Status Code
2.2 Result String
2.3 Reporting Stream
2.4 Accessing task output in calling code
3 Task Properties
4 Task Annotations
5 Scripted Tasks
5.1 Interface
5.1.1 performDso() vs. performId()
This documentation provides a guide for how to programmatically create Curation Tasks. For more information configuring Curation Tasks, see the Curatio
n System section of the documentation
First, it must provide a no argument constructor, so it can be loaded by the PluginManager. Thus, all tasks are 'named' plugins, with the taskname being
the plugin name.
The CurationTask interface is almost a "tagging" interface, and only requires a few very high-level methods be implemented. The most significant is:
If a task extends the AbstractCurationTask class, that is the only method it needs to define.
Status Code
This is returned to CS by any of a task's perform methods. The complete list of values, defined in Curator, is:
-2 CURATE_UNSET task did not return a status code because it has not yet run
In the administrative UI, this code is translated into the word or phrase configured by the ui.statusmessages property (discussed in Curation System) for
display.
Result String
512
The task may set a string indicating details of the outcome:
CS does not interpret or assign result strings; the task does it. A task may choose not to assign a result, but the "best practice" for tasks is to assign one
whenever possible. Code which invokes Curator.getResult() may use the result string for display or any other purpose.
Reporting Stream
For very fine-grained information, a task may write to a reporting stream. Unlike the result string, there is no limit to the amount of data that may be pushed
to this stream.
Task Properties
Task code may configure itself using ConfigurationService in the normal manner, or by the use of "task properties". See Curation System - Task Properties
for discussion of the issues for which task properties were invented. Any code which extends AbstractCurationTask has access to its configured task
properties.
Task Annotations
CS looks for, and will use, certain java annotations in the task Class definition that can help it invoke tasks more intelligently. An example may explain
best. Since tasks operate on DSOs that can either be simple (Items) or containers (Collections, and Communities), there is a fundamental problem or
ambiguity in how a task is invoked: if the DSO is a collection, should the CS invoke the task on each member of the collection, or does the task "know"
how to do that itself? The decision is made by looking for the @Distributive annotation: if present, CS assumes that the task will manage the details,
otherwise CS will walk the collection, and invoke the task on each member. The java class would be defined:
@Distributive
public class MyTask implements CurationTask
A related issue concerns how non-distributive tasks report their status and results: the status will normally reflect only the last invocation of the task in the
container, so important outcomes could be lost. If a task declares itself @Suspendable, however, the CS will cease processing when it encounters a FAIL
status. When used in the UI, for example, this would mean that if our virus scan is running over a collection, it would stop and return status (and result) to
the scene on the first infected item it encounters. You can even tune @Supendable tasks more precisely by annotating what invocations you want to
suspend on. For example:
@Suspendable(invoked=Curator.Invoked.INTERACTIVE)
public class MyTask implements CurationTask
would mean that the task would suspend if invoked in the UI, but would run to completion if run on the command-line.
513
Only a few annotation types have been defined so far, but as the number of tasks grow, we can look for common behavior that can be signaled by
annotation. For example, there is a @Mutative type: that tells CS that the task may alter (mutate) the object it is working on.
Scripted Tasks
DSpace 1.8 introduced limited (and somewhat experimental) support for deploying and running tasks written in languages other than Java. Since version
6, Java has provided a standard way (API) to invoke so-called scripting or dynamic language code that runs on the java virtual machine (JVM). Scripted
tasks are those written in a language accessible from this API. See Curation System - Scripted Tasks for information on configuring and running scripted
tasks.
Interface
Scripted tasks must implement a slightly different interface than the CurationTask interface used for Java tasks. The appropriate interface for scripting
tasks is ScriptedTask and has the following methods:
The difference is that ScriptedTask has separate perform methods for DSO and identifier. The reason for that is that some scripting languages (e.g.
Ruby) don't support method overloading.
There are a class of use-cases in which we want to construct or create new DSOs (DSpaceObject) given an identifier in a task. In these cases, there may
be no live DSO to pass to the task.
You actually can get curation system to call performId() if you queue a task then process the queue - when reading the queue all CLI has is the handle
to pass to the task.
514
Curation tasks in Jython
As mentioned in the "Scripted Tasks" chapter of Curation Tasks, you can write your curation tasks in several languages, including Jython (a flavour of
Python running on JVM).
Instructions are outdated and unproven in DSpace 7.x
Note: Installation location doesn't matter, this is not necessary for DSpace. You can safely delete it after you retrieve jython.jar and L
ib.
3. Install Jython to DSpace classpaths (step 2a already did this for you):
a. The goal is to put jython.jar and the jython Lib/ directory into every DSpace classpath you intend to use, so it must be installed in b
oth [dspace]/lib and the webapp that deploys to Tomcat (if you want to run from the UI) - [dspace]/webapps/server/WEB-INF
/lib/. There are no special maven/pom extensions - just copy in the jar and Lib/.
b. You can use symlinks if you wish as long as allowLinking (Tomcat <=7, Tomcat 8) is set to true in that context's configuration. However,
be warned that Tomcat documentation lists allowLinking="true" as a possible security concern.
c. Note: Older versions of Jython mention the need for jython-engine.jar to implement JSR-223. Don't worry about that, new Jython
versions, e.g. 2.7.1 don't require this.
4. Configure the curation framework to be aware of your new task(s):
a. set up the location of scripted tasks in the curation system. This means simply adding a property to [dspace]/config/modules
/curate.cfg:
script.dir=${dspace.dir}/ctscripts
b. in this directory, create a text file named "task.catalog". This is a Java properties file where lines beginning with '#' are
comments. Add a line for each task you write. The syntax is following:
Notes:
don't put spaces around the pipe character or you'll get an error similar to this one:
ERROR org.dspace.curate.TaskResolver @ Script engine: 'python ' is not installed
The "script engine name" is whatever name (or alias) jython registers in the JVM. You can use both "python" and "jython" as
engine name (tested on jython 2.7.1).
The logical task name can't conflict with existing (java) task names, but otherwise any single-word token can be used.
The file name is just the script file name in the script.dir directory
"constructor invocation" is the language specific way to create an object that implements the task interface - it's ClassName()
for Python
c. If you want pretty names in the UI, configure other curate.cfg properties - see "ui.tasknames" (or groups etc)
5. Write your task.
In the directory configured above, create your task (with the name configured in "task.catalog").
The basic requirement of any scripted task is that it implements the ScriptedTask Java interface.
So for our example, the mytask.py file might look like this:
class MyTask(ScriptedTask):
def init(self, curator, taskName):
print "initializing with Jython"
6.
515
6. Invoke the task.
You can do this the same way you would invoke any task (from command line, in the admin UI, etc). The advantage of scripting is that you do not
need to restart your servlet container to test changes; each task's source code is reloaded when you launch the task, so you can just put the
updated script in place.
Example of invocation from command line:
Note: "-r -" means that the script's standard output will be directed to the console. You can read more details in the "On the command line"
chapter of the Curation Tasks page.
See also
Curation Tasks page in the official documentation
Nailgun - for speeding up repeated runs of a dspace command from the command line
Note: since DSpace 4.0, there's a solution for running dspace CLI commands in batch: Executing streams of commands
Jython webapp for DSpace - general purpose (not curation task) webapp written in Jython, optionally with access to DSpace API
516
Development Tools Provided by DSpace
Date parser tester
Some parts of DSpace use a custom date/time parser (org.dspace.util.MultiFormatDateParser) which is driven by a table of regular
expressions, so it can match any of a variety of formats. The table is found in config/spring/api/discovery-solr.xml. To test new and altered
rules, you can use the DSpace command line tool's validate-date command. You can simply pass it a date/time string on the command line (dspace
validate-date 01-01-2015). You can pipe a stream of strings to be validated, one per line (dspace validate-date < test.data). Or you can
have it prompt you for each string to be tested (dspace validate-date).
517
Services to support Alternative Identifiers
Together with the Item Level Versioning an Identifier Service was introduced that make it possible to integrate new Identifiers. Currently the Identifier
Service is used for Items only, but this may be changed in future versions of DSpace. Identifiers used for different versions are an very important point as
part of an versioning strategy. The following documentation describes the Identifier Service in the context of Item Level Versioning, nevertheless the
Identifier Service is also used for Items when the Item Level Versioning is switched off.
Versioning Service
The Versioning Service will be responsible for the replication of one or more Items when a new version is requested. The new version will not yet be
preserved in the Repository, it will be preserved when the databases transactional window is completed, thus when errors arise in the versioning process,
the database will be properly kept in its original state and the application will alert that an exception has occurred that is in need of correction.
The Versioning Service will rely on a generic IdentifierService that is described below for minting and registering any identifiers that are required to track
the revision history of the Items.
518
public interface VersioningService {
Identifier Service
The Identifier Service maintains an extensible set of IdentifierProvider services that are responsible for two important activities in Identifier management:
1. Resolution: IdentifierService act in a manner similar to the existing HandleManager in DSpace, allowing for resolution of DSpace Items from
provided identifiers.
2. Minting: Minting is the act of reserving and returning an identifier that may be used with a specific DSpaceObject.
3. Registering: Registering is the act of recording the existence of a minted identifier with an external persistent resolver service. These services
may reside on the local machine (HandleManager) or exist as external services (PURL or EZID DOI registration services)
/**
*
* @param context
* @param dso
* @param identifier
* @return
*/
String lookup(Context context, DSpaceObject dso, Class<? extends Identifier> identifier);
/**
*
* This will resolve a DSpaceObject based on a provided Identifier. The Service will interrogate
the providers in
* no particular order and return the first successful result discovered. If no resolution is
successful,
* the method will return null if no object is found.
*
* TODO: Verify null is returned.
*
* @param context
* @param identifier
* @return
* @throws IdentifierNotFoundException
* @throws IdentifierNotResolvableException
*/
DSpaceObject resolve(Context context, String identifier) throws IdentifierNotFoundException,
IdentifierNotResolvableException;
/**
*
* Reserves any identifiers necessary based on the capabilities of all providers in the service.
*
* @param context
* @param dso
519
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void reserve(Context context, DSpaceObject dso) throws AuthorizeException, SQLException,
IdentifierException;
/**
*
* Used to Reserve a Specific Identifier (for example a Handle, hdl:1234.5/6) The provider is
responsible for
* Detecting and Processing the appropriate identifier, all Providers are interrogated, multiple
providers
* can process the same identifier.
*
* @param context
* @param dso
* @param identifier
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void reserve(Context context, DSpaceObject dso, String identifier) throws AuthorizeException,
SQLException, IdentifierException;
/**
*
* @param context
* @param dso
* @return
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void register(Context context, DSpaceObject dso) throws AuthorizeException, SQLException,
IdentifierException;
/**
*
* Used to Register a Specific Identifier (for example a Handle, hdl:1234.5/6) The provider is
responsible for
* Detecting and Processing the appropriate identifier, all Providers are interrogated, multiple
providers
* can process the same identifier.
*
* @param context
* @param dso
* @param identifier
* @return
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void register(Context context, DSpaceObject dso, String identifier) throws AuthorizeException,
SQLException, IdentifierException;
/**
* Delete (Unbind) all identifiers registered for a specific DSpace item. Identifiers are "unbound"
across
* all providers in no particular order.
*
* @param context
* @param dso
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void delete(Context context, DSpaceObject dso) throws AuthorizeException, SQLException,
IdentifierException;
/**
520
* Used to Delete a Specific Identifier (for example a Handle, hdl:1234.5/6) The provider is
responsible for
* Detecting and Processing the appropriate identifier, all Providers are interrogated, multiple
providers
* can process the same identifier.
*
* @param context
* @param dso
* @param identifier
* @throws org.dspace.authorize.AuthorizeException
* @throws java.sql.SQLException
* @throws IdentifierException
*/
void delete(Context context, DSpaceObject dso, String identifier) throws AuthorizeException,
SQLException, IdentifierException;
521
Batch Processing
In the current DSpace design, the database transactions are in most of the cases relatively long: from Context creation to the moment the Context is
completed. Especially when doing batch processing, that transaction can become very long. The new data access layer introduced in DSpace 6 which is
based on Hibernate has built-in cache and auto-update mechanisms. But these mechanisms do not work well with long transactions and even have an
exponentially adverse-effect on performance.
Therefore we added a new method enableBatchMode() to the DSpace Context class which tells our database connection that we are going to do some
batch processing. The database connection (Hibernate in our case) can then optimize itself to deal with a large number of inserts, updates and deletes.
Hibernate will then not postpone update statements anymore which is better in the case of batch processing. The method isBatchModeEnabled() lets
you check if the current Context is in "batch mode".
When dealing with a lot of records, it is also important to deal with the size of the (Hibernate) cache. A large cache can also lead to decreased
performance and eventually to "out of memory" exceptions. To help developers to better manage the cache, a method getCacheSize() was added to
the DSpace Context class that will give you the number of database records currently cached by the database connection. Another new method uncacheE
ntity(ReloadableEntity entity) will allow you to clear the cache (of a single object) and free up (heap) memory. The uncacheEntity() method
may be used to immediately remove an object from heap memory once the batch processing is finished with it. Besides the uncacheEntity() method,
the commit() method in the DSpace Context class will also clear the cache, flush all pending changes to the database and commit the current database
transaction. The database changes will then be visible to other threads.
BUT uncacheEntity() and commit() come at a price. After calling this method all previously fetched entities (hibernate terminology for database
record) are "detached" (pending changes are not tracked anymore) and cannot be combined with "attached" entities. If you change a value in a detached
entity, Hibernate will not automatically push that change to the database. If you still want to change a value of a detached entity or if you want to use that
entity in combination with attached entities (e.g. adding a bitstream to an item) after you have cleared the cache, you first have to reload that entity. Reloadi
ng means asking the database connection to re-add the entity from the database to the cache and get a new object reference to the required entity. From
then on, it is important that you use that new object reference. To simplify the process of reloading detached entities, we've added a reloadEntity
(ReloadableEntity entity) method to the DSpace Context class with a new interface ReloadableEntity. This method will give the user a new
"attached" reference to the requested entity. All DSpace Objects and some extra classes implement the ReloadableEntity interface so that they can be
easily reloaded.
Examples on how to use these new methods can be found in the IndexClient class. But to summarize, when batch processing it is important that:
1. You put the Context into batch processing mode using the method:
2. Perform necessary batch operations, being careful to call uncacheEntity() whenever you complete operations on each object. Alternatively,
you can commit() the context once the object cache reaches a particular size (see getCacheSize()). Remember, once an object is
"uncached", you will have to reload it (see reloadEntity()) before you can work with it again:
// To prevent memory issues, discard Item from the cache after processing
context.uncacheEntity(item);
}
// So, if you need to reuse your Collection *post* commit(), you'd have to reload it
Collection collection = context.reloadEntity(collection);
3. When you're finished with processing the records, you put the context back into its original mode:
context.enableBatchMode(originalMode);
522
Workflow
Configuration
The workflow main configuration can be found in the workflow.xml file, located in [dspace]/config/spring/api/workflow.xml . An example of
this workflow configuration file can be found below.
<beans>
<bean class="org.dspace.xmlworkflow.XmlWorkflowFactoryImpl">
<property name="workflowMapping">
<util:map>
<entry key="defaultWorkflow" value-ref="defaultWorkflow"/>
<!-- <entry key="123456789/4" value-ref="selectSingleReviewer"/>-->
<!-- <entry key="123456789/5" value-ref="scoreReview"/>-->
</util:map>
</property>
</bean>
<bean id="{workflow.id}"
class="org.dspace.xmlworkflow.state.Workflow">
<!-- Another workflow configuration-->
</bean>
</beans>
"name" attribute: a unique name used for the identification of the workflow and used in the workflow to collection mapping
"firstStep" property: the identifier of the first step of the workflow. This step will be the entry point of this workflow-process. When a new item
has been committed to a collection that uses this workflow, the step configured in the "firstStep" property will he the first step the item will go
through.
"steps" property: a list of all steps within this workflow (in the order they will be processed).
"id" attribute: a unique identifier (in one workflow process) for the role
523
"description" property: optional attribute to describe the role
"scope" property: optional attribute that is used to find our group and must have one of the following values, which are defined as constant fields
of org.dspace.xmlworkflow.Role.Scope:
COLLECTION: The collection value specifies that the group will be configured at the level of the collection. This type of groups is the
same as the type that existed in the original workflow system. In case no value is specified for the scope attribute, the workflow
framework assumes the role is a collection role.
REPOSITORY: The repository scope uses groups that are defined at repository level in DSpace. The name attribute should exactly
match the name of a group in DSpace.
ITEM: The item scope assumes that a different action in the workflow will assign a number of EPersons or Groups to a specific workflow-
item in order to perform a step. These assignees can be different for each workflow item.
"name" property: The name specified in the name attribute of a role will be used to lookup an eperson group in DSpace. The lookup will depend
on the scope specified in the "scope" attribute:
COLLECTION: The workflow framework will look for a group containing the name specified in the name attribute and the ID of the
collection for which this role is used.
REPOSITORY: The workflow framework will look for a group with the same name as the name specified in the name attribute.
ITEM: in case the item scope is selected, the name of the role attribute is not required.
"name" attribute: The name attribute specifies a unique identifier for the step. This identifier will be used when configuring other steps in order to
point to this step. This identifier can also be used when configuring the start step of the workflow item.
"userSelectionMethod" property: This attribute defines the UserSelectionAction that will be used to determine how to attach users to this
step for a workflow-item. The value of this attribute must refer to the identifier of an action bean in the workflow-actions.xml. Examples of the user
attachment to a step are the currently used system of a task pool or as an alternative directly assigning a user to a task.
"role" property: optional attribute that must point to the id attribute of a role element specified for the workflow. This role will be used to define
the epersons and groups used by the userSelectionMethod.
RequiredUsers
Each step contains a number of actions that the workflow item will go through. In case the action has a user interface, the users responsible for the
exectution of this step will have to execute these actions before the workflow item can proceed to the next action or the end of the step.
There is also an optional subsection that can be defined for a step part called "outcomes". This can be used to define outcomes for the step that differ
from the one specified in the nextStep attribute. Each action returns an integer depending on the result of the action. The default value is "0" and will make
the workflow item proceed to the next action or to the end of the step.
In case an action returns a different outcome than the default "0", the alternative outcomes will be used to lookup the next step. The "outcomes" element
contains a number of steps, each having a status attribute. This status attribute defines the return value of an action. The value of the element will be used
to lookup the next step the workflow item will go through in case an action returns that specified status.
API configuration
The workflow actions configuration is located in the [dspace]/config/spring/api/ directory and is named "workflow-actions.xml". This
configuration file describes the different Action Java classes that are used by the workflow framework. Because the workflow framework uses Spring
framework for loading these action classes, this configuration file contains Spring configuration.
This file contains the beans for the actions and user selection methods referred to in the workflow.xml. In order for the workflow framework to work
properly, each of the required actions must be part of this configuration.
524
<beans
xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans
/spring-beans-2.0.xsd
http://www.springframework.org/schema/util http://www.springframework.org/schema/util
/spring-util-2.0.xsd">
<!-- Below the class identifiers come the declarations for out actions/userSelectionMethods -->
User selection action: This type of action is always the first action of a step and is responsible for the user selection process of that step. In case a
step has no role attached, no user will be selected and the NoUserSelectionAction is used.
Processing action: This type of action is used for the actual processing of a step. Processing actions contain the logic required to execute the
required operations in each step. Multiple processing actions can be defined in one step. These user and the workflow item will go through these
actions in the order they are specified in the workflow configuration unless an alternative outcome is returned by one of them.
This bean defines a new UserSelectionActionConfig and the following child tags:
constructor-arg: This is a constructor argument containing the ID of the task. This is the same as the id attribute of the bean and is used by the
workflow configuration to refer to this action.
property processingAction: This tag refers the the ID of the API bean, responsible for the implementation of the API side of this action. This
bean should also be configured in this XML.
property requiresUI: In case this property is true, the workflow framework will expect a user interface for the action. Otherwise the framework
will automatically execute the action and proceed to the next one.
Processing Action
Processing actions are configured similarly to the user selection actions. The only difference is that these processing action beans are implementations of
the WorkflowActionConfig class instead of the UserSelectionActionConfig class.
Authorizations
Currently, the authorizations are always granted and revoked based on the tasks that are available for certain users and groups. The types of authorization
policies that is granted for each of these is always the same:
525
READ
WRITE
ADD
DELETE
Database
The workflow uses a separate metadata schema named workflow. The fields this schema contains can be found in the [dspace]/config
/registries directory and in the file workflow-types.xml. This schema is only used when using the score reviewing system at the moment, but one
could always use this schema if metadata is required for custom workflow steps.
The following tables have been added to the DSpace database. All tables are prefixed with 'cwf_' to avoid any confusion with the existing workflow related
database tables:
cwf_workflowitem
The cwf_workflowitem table contains the different workflowitems in the workflow. This table has the following columns:
workflowitem_id: The identifier of the workflowitem and primary key of this table
item_id: The identifier of the DSpace item to which this workflowitem refers.
collection_id: The collection to which this workflowitem is submitted.
multiple_titles: Specifies whether the submission has multiple titles (important for submission steps)
published_before: Specifies whether the submission has been published before (important for submission steps)
multiple_files: Specifies whether the submission has multiple files attached (important for submission steps)
cwf_collectionrole
The cwf_collectionrole table represents a workflow role for one collection. This type of role is the same as the roles that existed in the original workflow
meaning that for each collection a separate group is defined to described the role. The cwf_collectionrole table has the following columns:
collectionrol_id: The identifier of the collectionrole and the primaty key of this table
role_id: The identifier/name used by the workflow configuration to refer to the collectionrole
collection_id: The collection identifier for which this collectionrole has been defined
group_id: The group identifier of the group that defines the collection role
cwf_workflowitemrole
The cwf_workflowitemrole table represents roles that are defined at the level of an item. These roles are temporary roles and only exist during the
execution of the workflow for that specific item. Once the item is archived, the workflowitemrole is deleted. Multiple rows can exist for one workflowitem
with e.g. one row containing a group and a few containing epersons. All these rows together make up the workflowitemrole The cwf_workflowitemrole table
has the following columns:
workflowitemrole_id: The identifier of the workflowitemrole and the primaty key of this table
role_id: The identifier/name used by the workflow configuration to refer to the workflowitemrole
workflowitem_id: The cwf_workflowitem identifier for which this workflowitemrole has been defined
group_id: The group identifier of the group that defines the workflowitemrole role
eperson_id: The eperson identifier of the eperson that defines the workflowitemrole role
cwf_pooltask
The cwf_pooltask table represents the different task pools that exist for a workflowitem. These task pools can be available at the beginning of a step and
contain all the users that are allowed to claim a task in this step. Multiple rows can exist for one task pool containing multiple groups and epersons. The
cwf_pooltask table has the following columns:
pooltask_id: The identifier of the pooltask and the primaty key of this table
workflowitem_id: The identifier of the workflowitem for which this task pool exists
workflow_id: The identifier of the workflow configuration used for this workflowitem
step_id: The identifier of the step for which this task pool was created
action_id: The identifier of the action that needs to be displayed/executed when the user selects the task from the task pool
eperson_id: The identifier of an eperson that is part of the task pool
group_id: The identifier of a group that is part of the task pool
cwf_claimtask
The cwf_claimtask table represents a task that has been claimed by a user. Claimed tasks can be assigned to users or can be the result of a claim from
the task pool. Because a step can contain multiple actions, the claimed task defines the action at which the user has arrived in a particular step. This
makes it possible to stop working halfway the step and continue later. The cwf_claimtask table contains the following columns:
claimtask_id: The identifier of the claimtask and the primary key of this table
workflowitem_id: The identifier of the workflowitem for which this task exists
workflow_id: The id of the workflow configuration that was used for this workflowitem
step_id: The step that is currenlty processing the workflowitem
action_id: The action that should be executed by the owner of this claimtask
owner_id: References the eperson that is responsible for the execution of this task
526
cwf_in_progress_user
The cwf_in_progess_user table keeps track of the different users that are performing a certain step. This table is used because some steps might require
multiple users to perform the step before the workflowitem can proceed. The cwf_in_progress_user table contains the following columns:
in_progress_user_id: The identifier of the in progress user and the primary key of this table
workflowitem_id: The identifier of the workflowitem for which the user is performing or has performed the step.
user_id: The identifier of the eperson that is performing or has performe the task
finished: Keeps track of the fact that the user has finished the step or is still in progress of the execution
527
DSpace Reference
Configuration Reference
DSpace Item State Definitions
Directories and Files
Metadata and Bitstream Format Registries
Architecture
Application Layer
Business Logic Layer
DSpace Services Framework
Storage Layer
History
Changes in 7.x
Changes in Older Releases
528
Configuration Reference
There are a numbers of ways in which DSpace may be configured and/or customized. This chapter of the documentation will discuss the configuration of
the software and will also reference customizations that may be performed in the chapter following.
For ease of use, the Configuration documentation is broken into several parts:
General Configuration - addresses general conventions used with configuring the local.cfg file, dspace.cfg and other configuration files
which use similar conventions.
The local.cfg Configuration Properties File - describes how to use the local.cfg file to store all your locally customized configurations
The dspace.cfg Configuration Properties File - specifies the basic dspace.cfg file settings (these settings specify the default configuration for
DSpace)
Optional or Advanced Configuration Settings - contain other more advanced settings that are optional in the dspace.cfg configuration file.
1 General Configuration
1.1 Configuration File Syntax
1.1.1 Special Characters
1.1.2 Specifying Multiple Values for Properties
1.1.3 Including other Property Files
1.2 Configuration Scheme for Reloading and Overriding
1.3 Why are there multiple copies of some config files?
2 The local.cfg Configuration Properties File
3 The dspace.cfg Configuration Properties File
3.1 Main DSpace Configurations
3.2 General Solr Configuration
3.3 DSpace Database Configuration
3.3.1 To provide the database connection pool externally
3.4 DSpace Email Settings
3.4.1 Wording of E-mail Messages
3.4.1.1 Templates can set message headers
3.5 File Storage
3.6 Logging Configuration
3.7 General Plugin Configuration
3.8 Configuring the Search Engine
3.9 Handle Server Configuration
3.10 Delegation Administration: Authorization System Configuration
3.11 Inheritance of collection default policy (since 7.1)
3.12 Login as feature
3.13 Restricted Item Visibility Settings
3.14 Proxy Settings
3.15 Configuring Media Filters
3.16 Crosswalk and Packager Plugin Settings
3.16.1 Configurable MODS Dissemination Crosswalk
3.16.2 XSLT-based Crosswalks
3.16.2.1 Testing XSLT Crosswalks
3.16.3 Configurable Qualified Dublin Core (QDC) dissemination crosswalk
3.16.4 Configuring Crosswalk Plugins
3.16.5 Configuring Packager Plugins
3.17 Event System Configuration
3.18 Embargo
3.19 Checksum Checker Settings
3.20 Item Export and Download Settings
3.21 Subscription Emails
3.22 Hiding Metadata
3.23 Settings for the Submission Process
3.24 Configuring the Sherpa/RoMEO Integration
3.25 Configuring Creative Commons License
3.26 WEB User Interface Configurations
3.27 Item Counts in user interface
3.28 Browse Index Configuration
3.28.1 Defining the storage of the Browse Data
3.28.2 Defining the Indexes
3.28.3 Defining Sort Options
3.28.4 Hierarchical Browse Indexes
3.28.5 Other Browse Options
3.28.6 Browse Index Authority Control Configuration
3.28.7 Tag cloud
3.29 Links to Other Browse Contexts
3.30 Submission License Substitution Variables
3.31 Syndication Feed (RSS) Settings
3.32 OpenSearch Support
3.33 Content Inline Disposition Threshold / Format
3.34 Multi-file HTML Document/Site Settings
3.35 Sitemap Settings
3.36 Authority Control Settings
529
3.37 Configuring Multilingual Support
3.37.1 Setting the Default Language for the Application
3.37.2 Supporting More Than One Language
3.37.2.1 Changes in dspace.cfg
3.37.2.2 Related Files
3.38 Upload File Settings
3.39 SFX Server (OpenURL)
3.40 Controlled Vocabulary Settings
4 Optional or Advanced Configuration Settings
4.1 The Metadata Format and Bitstream Format Registries
4.1.1 Metadata Format Registries
4.1.2 Bitstream Format Registry
4.2 Configuring Usage Instrumentation Plugins
4.2.1 The Passive Plugin
4.2.2 The Tab File Logger Plugin
4.3 Behavior of the workflow system
4.4 Recognizing Web Spiders (Bots, Crawlers, etc.)
5 Command-line Access to Configuration Properties
General Configuration
In the following sections you will learn about the different configuration files that you will need to edit so that you may make your DSpace installation work.
DSpace provides a number of textual configuration files which may be used to configure your site based on local needs. These include:
[dspace]/config/dspace.cfg : The primary configuration file, which contains the main configurations for DSpace.
[dspace]/config/modules/*.cfg : Module configuration files, which are specific to various modules/features within DSpace.
[dspace]/config/local.cfg : A (optional, but highly recommended) localized copy of configurations/settings specific to your DSpace (see T
he local.cfg Configuration Properties File below)
Additional feature-specific configuration files also exist under [dspace]/config/, some of these include:
default.license : the default deposit license used by DSpace during the submission process (see Submission User Interface
documentation)
hibernate.cfg.xml : The Hibernate class configuration for the DSpace database (almost never requires changing)
item-submission.xml : the default item submission process for DSpace (see Submission User Interface documentation)
launcher.xml : The configuration of the DSpace command-line "launcher" ( [dspace]/bin/dspace , see the DSpace Command
Launcher documentation)
log4j2.xml : The default logging settings for DSpace log files (usually placed in [dspace]/log)
submission-forms.xml: The default deposit forms for DSpace, used by item-submission.xml (see Submission User Interface
documentation)
As most of these configurations are detailed in other areas of the DSpace documentation (see links above), this section concentrates primarily on the "*.
cfg" configuration files (namely dspace.cfg and local.cfg).
All DSpace *.cfg files use the Apache Commons Configuration properties file syntax. This syntax is very similar to a standard Java properties file, with a
few notable enhancements described below.
Comments all start with a "#" symbol. These lines are ignored by DSpace.
Other settings appear as property/value pairs of the form: property.name = property value
Certain special characters (namely commas) MUST BE escaped. See the "Special Characters" section below
Values assigned in the same *.cfg file are "additive", and result in an array of values. See "Specifying Multiple Values for Properties" below.
Some property defaults are "commented out". That is, they have a "#" preceding them, and the DSpace software ignores the config property. This may
cause the feature not to be enabled, or, cause a default property to be used.
The property value may contain references to other configuration properties, in the form ${property.name}. A property may not refer to itself. Examples:
dspace.dir = /path/to/dspace
dspace.name = My DSpace
# However, this will result in an ERROR, as the property cannot reference itself
property3.name = ${property3.name}
530
Special Characters
Certain characters in *.cfg files are considered special characters, and must be escaped in any values. The most notable of these special characters
include:
Commas (,) : as they represent lists or arrays of values (see "Specifying Multiple Values for Properties" below)
Backslashes (\) : as this is the escape character
This means that if a particular setting needs to use one of these special characters in its value, it must be escaped. Here's a few examples:
# WRONG SETTING
# This setting is INVALID. DSpace is expecting your site name to be a single value,
# But, this setting would create an array of two values: "DSpace" and "My Institution"
dspace.name = DSpace, My Institution
# WRONG SETTING
# As the backslash is the escape character, this won't work
property.name = \some\path
# CORRECT SETTING
# If you want a literal backslash, you need to escape it with "\\"
# So, the below value will be returned as "\some\path"
property.name = \\some\\path
Additional examples of escaping special characters are provided in the documentation of the Apache Commons Configuration properties file syntax.
For example:
# The below settings define *two* AuthenticationMethods that will be enabled, LDAP and Password authentication
# Notice how the same property name is simply repeated, and passed different values.
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.LDAPAuthentication
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.PasswordAuthentication
Please be aware that this ONLY works if you are reusing the exact same configuration in the same configuration file. This causes the values to be
"additive" (i.e they are appended to the same list).
However, as you'll see below, the local.cfg file always overrides settings elsewhere. So, if the above "AuthenticationMethod" plugin was specified in
both your authentication.cfg and your local.cfg, the value(s) in your local.cfg would override the defaults in your authentication.cfg
(more on that below).
Additional examples of creating lists or arrays of values are provided in the documentation of the Apache Commons Configuration properties file syntax.
For example, the dspace.cfg includes/embeds all of the default config/modules/*.cfg files via a series of "include=" settings near the bottom of
the dspace.cfg. As an example, here's a small subset of those include calls:
531
# defines our modules subdirectory
module_dir = modules
# The following lines include specific "authentication*.cfg" files inside your dspace.cfg
# This essentially "embeds" their configurations into your dspace.cfg,
# treating them as if they were a single configuration file.
include = ${module_dir}/authentication.cfg
include = ${module_dir}/authentication-ip.cfg
include = ${module_dir}/authentication-ldap.cfg
include = ${module_dir}/authentication-password.cfg
include = ${module_dir}/authentication-shibboleth.cfg
This ability to include properties files within others is very powerful, as it allows you to inherit settings from other files, or subdivide large configuration files.
Be aware that this essentially causes DSpace to treat all included configurations as if they were part of the parent file. This means that, in the above
example, as far as DSpace is concerned, all the settings contained within the authentication*.cfg files "appear" as though they are specified in the
main dspace.cfg.
This ability to include other files is also possible with the local.cfg file, should you want to subdivide your localized settings into several locally specific
configuration files.
While the DSpace API supports dynamically reloading configurations, the user or machine interfaces may still cache some configuration settings. This
means that while the API layer may reload a new value, that new value may not always affect/change the behavior of your user interface (until you restart
Tomcat).
Also, please be aware that all DSpace configuration values loaded into Spring beans (for example configurations that appear in Spring XML configuration
files or in @Value annotations) are cached by Spring. This means that they will not be reloadable within Spring beans until Tomcat is restarted.
Because DSpace supports the Apache Commons Configuration, its configurations can now be reloaded without restarting your servlet container (e.g.
Tomcat). By default, DSpace checks for changes to any of its runtime configuration files every 5 seconds. If a change has been made, the configuration file
is reloaded. The 5 second interval is configurable in the config-definition.xml (which defines the configuration scheme DSpace uses).
Additionally, DSpace provides the ability to easily override default configuration settings (in dspace.cfg or modules/*.cfg) using a local.cfg file (see The
local.cfg Configuration Properties File) or using System Properties / Environment Varilables.
Both of these features are defined in DSpace's default "configuration scheme" or "configuration definition" in the [dspace]/config/config-
definition.xml file. This file defines the Apache Commons Configuration settings that DSpace utilizes by default. It is a valid "configuration definition"
file as defined by Apache Commons Configuration. See their Configuration Definition File Documentation for more details.
You are welcome to customize the config-definition.xml to customize your local configuration scheme as you see fit. Any customizations to this
file will require restarting your servlet container (e.g. Tomcat).
By default, the DSpace config-definition.xml file defines the following configuration scheme:
Configuration File Syntax/Sources: All DSpace configurations are loaded via Properties files (using the Configuration File Syntax detailed above)
Note: Apache Commons Configuration does support other configuration sources such as XML configurations or database configurations
(see its Overview documentation). At this time, DSpace does not utilize these other sorts of configurations by default. However, it would
be possible to customize your local config-definition.xml to load settings from other locations.
Configuration Files/Sources: By default, only two configuration files are loaded into Apache Commons Configuration for DSpace:
local.cfg (see The local.cfg Configuration Properties File documentation below)
dspace.cfg (NOTE: all modules/*.cfg are loaded by dspace.cfg via "include=" statements at the end of that configuration file.
They are essentially treated as sub-configs which are embedded/included into the dspace.cfg)
Configuration Override Scheme: The configuration override scheme is defined as follows. Configurations specified in earlier locations will
automatically override any later values:
System Properties (-D[setting]=[value]) override all other options
Environment Variables.
DSpace provides a custom environment variable syntax as follows:
All periods (.) in configuration names must be translated to "__P__" (two underscores, capital P, two underscores), e.
g. "dspace__P__dir" environment variable will override the "dspace.dir" configuration in local.cfg (or other *.cfg files)
All dashes (-) in configuration names must be translated to "__D__" (two underscores, capital D, two underscores), e.
g. "authentication__D__ip__P__groupname" environment variable will override the "authentication-ip.groupname"
configuration in local.cfg (or other *.cfg files)
local.cfg
dspace.cfg (and all modules/*.cfg files) contain the default values for all settings.
Configuration Auto-Reload: By default, all configuration files are automatically checked every 5 seconds for changes. If they have changed, they
are automatically reloaded.
For more information on customizing our default config-definition.xml file, see the Apache Commons Configuration documentation on the configuration
definition file. Internally, DSpace simply uses the DefaultConfigurationBuilder class provided by Apache Commons Configuration to initialize our
configuration scheme (and load all configuration files).
532
Customizing the default configuration scheme
Because the config-definition.xml file is just a Configuration Definition file for Apache Commons Configuration, you can also choose to customize
the above configuration scheme based on your institution's local needs. This includes, but is not limited to, changing the name of "local.cfg", adding
additional configuration files/sources, or modifying the override or auto-reload schemes. For more information, see the Configuration Definition File
Documentation from Apache Commons Configuration.
1. The "source" configuration file(s) are found under in [dspace-source]/dspace/config/ or subdirectories. This also includes the [dspace-
source]/local.cfg
2. The "runtime" configuration file(s) that are found in [dspace]/config/
The DSpace server (webapp) and command line programs only look at the runtime configuration file(s).
When you are revising/changing your configuration values, it may be tempting to only edit the runtime file. DO NOT do this. Whenever you rebuild DSpace,
it will "reset" your runtime configuration to whatever is in your source directories (the previous runtime configuration is copied to a date suffixed file, should
you ever need to restore it).
Instead, we recommend to always make the same changes to the source version of the configuration file in addition to the runtime file. In other words,
the source and runtime files should always be identical / kept in sync.
One way to keep the two files in synchronization is to edit your files in [dspace-source]/dspace/config/ and then run the following commands to
rebuild DSpace and install the updated configs:
cd [dspace-source]/dspace/
mvn package
cd [dspace-source]/dspace/target/dspace-installer
ant update_configs
This will copy the source configuration files into the runtime ([dspace]/config) directory. Another option to manually sync the files by copying them to each
directory.
Please note that there are additional "ant" commands to help with configuration management:
"ant update_configs" ==> Moves existing configs in [dspace]/config/ to *.old files and replaces them with what is in [dspace-source]
/dspace/config/
"ant -Doverwrite=false update_configs" ==> Leaves existing configs in [dspace]/config/ intact. Just copies new configs from
[dspace-source]/dspace/config/ over to *.new files.
As of DSpace 6 and above, the old "build.properties" configuration file has been replaced by this new "local.cfg" configuration file. For
individuals who are familiar with the old build.properties file, this new local.cfg differs in a few key ways:
Unlike build.properties, the local.cfg file can be used to override ANY setting in any other configuration file (dspace.cfg or modules
/*.cfg). To override a default setting, simply copy the configuration into your local.cfg and change its value(s).
Unlike build.properties, the local.cfg file is not utilized during the compilation process (e.g. mvn package). But, it is automatically copied
alongside the final dspace.cfg into your installation location ([dspace]/config/), where it overrides default DSpace settings with your locally
specific settings at runtime.
Like build.properties, the local.cfg file is expected to be specified in the source directory by default ([dspace-source]). There is an
example ([dspace-source]/dspace/config/local.cfg.EXAMPLE) provided which you can use to create a [dspace-source]/dspace
/config/local.cfg.
Many configurations have changed names between DSpace 5 (and below) and DSpace 6 (and above)
If you are upgrading from an earlier version of DSpace, you will need to be aware that many configuration names/keys have changed. Because Apache
Commons Configuration allows for auto-overriding of configurations, all configuration names/keys in different *.cfg files MUST be uniquely named
(otherwise accidental, unintended overriding may occur).
In order to create this powerful ability to override configurations in your local.cfg, all modules/*.cfg files had their configurations renamed to be
prepended with the module name. As a basic example, all the configuration settings within the modules/oai.cfg configuration now start with "oai.".
Additionally, while the local.cfg may look similar to the old build.properties, many of its configurations have slightly different names. So, simply
copying your build.properties into a local.cfg will NOT work.
This means that DSpace 5.x (or below) configurations are NOT guaranteed compatible with DSpace 6. While you obviously can use your old
configurations as a reference, you will need to start with fresh copy of all configuration files, and reapply any necessary configuration changes (this has
always been the recommended procedure). However, as you'll see below, you'll likely want to do that anyways in order to take full advantage of the new lo
cal.cfg file.
533
It is possible to easily override default DSpace configurations (from dspace.cfg or modules/*.cfg files) in your own local.cfg configuration file.
A example [dspace-source]/dspace/config/local.cfg.EXAMPLE is provided with DSpace. The example only provides a few key configurations
which most DSpace sites are likely to need to customize. However, you may add (or remove) any other configuration to your local.cfg to customize it
as you see fit.
To get started, simply create your own [dspace-source]/dspace/config/local.cfg based on the example, e.g.
cd [dspace-source]/dspace/config/
cp local.cfg.EXAMPLE local.cfg
You can then begin to edit your local.cfg with your local settings for DSpace. There are a few key things to note about the local.cfg file:
Override any default configurations: Any setting in your local.cfg will automatically OVERRIDE a setting of the same name in the dspace.
cfg or any modules/*.cfg file. This also means that you can copy ANY configuration (from dspace.cfg or any modules/*.cfg file) into
your local.cfg to specify a new value.
For example, specifying dspace.url in local.cfg will override the default value of dspace.url in dspace.cfg.
Also, specifying oai.solr.url in local.cfg will override the default value of oai.solr.url in config/modules/oai.cfg
Configuration Syntax: The local.cfg file uses the Apache Commons Configuration Property file syntax (like all *.cfg files) . For more
information see the section on Configuration File Syntax above.
This means the local.cfg also supports enhanced features like the ability to include other config files (via "include=" statements).
Override local.cfg via System Properties: As needed, you also are able to OVERRIDE settings in your local.cfg by specifying them as
System Properties or Environment Variables.
For example, if you wanted to change your dspace.dir in development/staging environment, you could specify it as a System Property
(e.g. -Ddspace.dir=[new-location]). This new value will override any value in both local.cfg and dspace.cfg.
When you build DSpace (e.g. mvn package), this local.cfg file will be automatically copied to [dspace]/config/local.cfg. Similar to the dspace
.cfg, the "runtime" configuration (used by DSpace) is the one in [dspace]/config/local.cfg. See the Why are there multiple copies of some config
files? question above for more details on the runtime vs source configuration.
Here's a very basic example of settings you could place into your local.cfg file (with inline comments):
534
# This is a simple example local.cfg file which shows off options
# for creating your own local.cfg
# The overrides the default "dspace.ui.url" setting it to the same value as my "baseUrl" above
dspace.ui.url = https://dspace.myuniversity.edu
# If our database settings are the same as the default ones in dspace.cfg,
# then, we may be able to simply customize the db.username and db.password
db.username = myuser
db.password = mypassword
# For DSpace, we want the LDAP and Password authentication plugins enabled
# This overrides the default AuthenticationMethod in /config/modules/authentication.cfg
# Since we specified the same key twice, these two values are appended (see Configuration File Syntax above)
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.LDAPAuthentication
plugin.sequence.org.dspace.authenticate.AuthenticationMethod = org.dspace.authenticate.PasswordAuthentication
Remember, any of the below dspace.cfg settings can be copied into your local.cfg configuration file and overridden. So, rather than editing the dspace.
cfg (or any of the modules/*.cfg), it's recommended to simply override the default values in your local.cfg. That way, your local.cfg can serve
as the record of which configurations you have actually tweaked in your DSpace, which may help to simplify future upgrades.
The dspace.cfg contains basic information about a DSpace installation, including system path information, network host information, and other like
items. It is the default configuration file for DSpace, used by DSpace when it is actively running. However, as noted above, any of these default
configurations may be overridden in your own local.cfg configuration file.
P dspace.dir
r
o
p
e
rt
y:
535
E /dspace
x
a
m
pl
e
V
al
u
e:
I Root directory of DSpace installation. Omit the trailing slash '/'. Note that this setting is used by default in other settings, e.g. assetstore.dir .
n
f (On Windows be sure to use forward slashes for the directory path! For example: "C:/dspace" is a valid path for Window.)
o
r
m
a
ti
o
n
al
N
o
t
e:
P dspace.server.url
r
o
p
e
rt
y:
E https://dspace.myu.edu/server
x
a
m
pl
e
V
al
u
e:
I Main URL at which DSpace backend ("server" webapp) is publicly available. If using port 80 (HTTP) or 443 (HTTPS), you may omit the port
n number. Otherwise the port number must be included. Do not include a trailing slash ('/'). In Production, you must use HTTPS if you wish to
f access the REST API from a different server/domain.
o
r This configuration should match the User Interface's "rest" settings in the config.yml (specifically the "ssl", "host", "port" and "nameSpace" settings in
m that file). See User Interface Configuration
a
ti
o
n
al
N
o
t
e:
P dspace.server.ssr.url
r
o
p
e
rt
y:
536
E http://localhost:8080/server
x
a
m
pl
e
V
al
u
e:
I (7.6.3 and later) Optional, separate Server-Side Rendering URL for the REST API. When specified, this URL will be used by the DSpace User
n Interface during server-side execution. This may be a private or localhost URL, but it must be accessible to the User Interface's server-side code. F
f or example, the "dspace.server.url" (above) should always reference a public URL like "https://mydspace.edu/server". But, you could set "dspace.
o server.ssr.url" to a localhost URL (e.g. "http://localhost:8080/server") if your REST API is running on the same machine as the User Interface. This
r configuration would result in all client-side code (running in the user's browser) accessing the REST API via the "dspace.server.url"(a public, HTTPS
m URL), while the server-side code (triggered by SSR) would access the REST API via "dspace.server.ssr.url" (a localhost HTTP URL).
a
ti When this configuration is specified you must set the same URL in the User Interface's "rest.ssrBaseUrl" settings in its config.yml. See User
o Interface Configuration
n
al
N
o
t
e:
P dspace.ui.url
r
o
p
e
rt
y:
E dspace.ui.url = http://dspacetest.myu.edu:4000
x
a
m
pl
e
V
al
u
e:
I Main URL at which the DSpace frontend (Angular User Interface) is publicly available. If using port 80 (HTTP) or 443 (HTTPS), you may omit the
n port number. Otherwise the port number must be included. Do not include a trailing slash ('/'). In Production, you should be using HTTPS for
f security purposes.
o
r This URL should match the URL you type in the browser to access your User Interface. In the backend, this URL is primarily used to build UI-based
m URLs in sitemaps, email messages, etc. Therefore, it need not be set on initial installation, but it should be configured as soon as your user
a interface is installed. If you are not using the DSpace UI (and running the backend "headless"), this may be set to the URL of whatever you
ti consider your primary "user interface".
o
n
al
n
o
te
P dspace.name
r
o
p
e
rt
y:
537
I Short and sweet site name, used in e-mails, exports and machine interfaces (e.g. OAI-PMH). It is not currently used by the Angular UI.
n
f
o
r
m
a
ti
o
n
al
N
o
t
e:
See also the additional Solr configuration properties for specific indexes such as search, statistics, authority and OAI PMH.
Property: solr.server
Informational Base URL to the Solr server. Specific indexes append to this value.
Note:
Property: solr.client.maxTotalConnections
Example solr.client.maxTotalConnections = 20
Value:
Informational The maximum number of connections that will be opened between DSpace and Solr.
Note:
Property: solr.client.maxPerRoute
Example solr.client.maxPerRoute = 15
Value:
Informational The maximum number of connections that will be opened between DSpace and a specific Solr instance (if you have more than one).
Note:
Property: solr.client.keepAlive
Informational The default amount of time that a connection in use will be held open, in milliseconds. Solr may specify a different keep-alive interval
Note: and it will be obeyed.
Property: solr.client.timeToLive
Informational The maximum amount of time before an open connection will be closed when idle, in seconds. New connections will be opened as
Note: needed, subject to the above limits.
Oracle support has been deprecated in DSpace. It will no longer be supported as of June/July 2023. See https://github.com/DSpace/DSpace/issues/8214
538
Proper db.url
ty:
Inform The above value is the default value when configuring with PostgreSQL. When using Oracle, use this value: jbdc.oracle.thin:@//host:
ational port/dspace
Note:
Proper db.username
ty:
Inform In the installation directions, the administrator is instructed to create the user "dspace" who will own the database "dspace".
ational
Note:
Proper db.password
ty:
Inform This is the password that was prompted during the installation process (cf. 3.2.3. Installation)
ational
Note:
Proper db.schema
ty:
Inform If your database contains multiple schemas, you can avoid problems with retrieving the definitions of duplicate objects by specifying the
ational schema name here that is used for DSpace by uncommenting the entry. This property is optional.
Note:
For PostgreSQL databases, this is often best set to "public" (default schema). For Oracle databases, the schema is usually equivalent to the
username of your database account. So, for Oracle, this may be set to ${db.username} in most scenarios.
Proper db.maxconnections
ty:
Examp db.maxconnections = 30
le
Value:
Proper db.maxwait
ty:
Inform Maximum time to wait before giving up if all connections in pool are busy (in milliseconds).
ational
Note:
Proper db.maxidle
ty:
Examp db.maxidle = -1
le
Value:
539
Proper db.cleanDisabled
ty:
Inform This is a developer-based setting which determines whether you are allowed to run "./dspace database clean" to completely delete all content
ational and tables in your database. This should always be set to "true" in Production to protect against accidentally deleting all your content by
Note: running that command. (Default is set to true)
If you are using Tomcat, then the object might be defined using a <Resource> element, or connected to a <Resource> child of <GlobalNamingResour
ces> using a <ResourceLink> element. See your Servlet container's documentation for details of configuring the JNDI initial context. For example,
Tomcat provides a useful JNDI Datasource How-to
Earlier releases of DSpace provided a configuration property db.jndi to specify the name to be looked up, but that has been removed. The name is
specified in config/spring/api/core-hibernate.xml if you really need to change it.
DSpace will look up a javax.mail.Session object in JNDI and, if found, will use that to send email. Otherwise it will create a Session using some of the
properties detailed below.
Prope mail.server
rty:
Infor The address on which your outgoing SMTP email server can be reached.
matio
nal
Note:
Prope mail.server.username
rty:
Infor SMTP mail server authentication username, if required. This property is optional.
matio
nal
Note:
Prope mail.server.password
rty:
Infor SMTP mail server authentication password, if required. This property is optional.
matio
nal
Note:
Prope mail.server.port
rty:
Exam mail.server.port = 25
ple
Value:
540
Infor The port on which your SMTP mail server can be reached. By default, port 25 is used. Change this setting if your SMTP mailserver is running
matio on another port. This property is optional.
nal
Note:
Prope mail.from.address
rty:
Infor The "From" address for email. Change the 'myu.edu' to the site's host name.
matio
nal
Note:
Prope feedback.recipient
rty:
Infor When a user clicks on the feedback link/feature, the information will be sent to the email address of choice. This configuration is currently
matio limited to only one recipient. This is also the email address displayed on the contacts page.
nal
Note:
Prope mail.admin
rty:
Exam Email address of the general site administrator (Webmaster). System notifications/reports and other sysadmin emails are sent to this email
ple address.
Value:
Prope mail.admin.name
rty:
Prope alert.recipient
rty:
Infor Enter the recipient for server errors and alerts. This property is optional and defaults to the ${mail.admin} setting
matio
nal
Note:
Prope registration.notify
rty:
Infor Enter the recipient that will be notified when a new user registers on DSpace. This property is optional & defaults to no value.
matio
nal
Note:
Prope mail.charset
rty:
541
Exam mail.charset = UTF-8
ple
Value:
Infor Set the default mail character set. This may be over-ridden by providing a line inside the email template '#set($charset =
matio "encoding")'. Otherwise this default is used.
nal
Note:
Prope mail.allowed.referrers
rty:
Infor A comma separated list of hostnames that are allowed to refer browsers to email forms. This property is optional. UNSUPPORTED in DSpace
matio 7.0
nal
Note:
Prope mail.extraproperties
rty:
Exam
ple # Example which can fix "Could not convert socket to TLS" errors (i.e. SMTP over TLS)
Value: mail.extraproperties = mail.smtp.socketFactory.port=587, \
mail.smtp.starttls.enable=true, \
mail.smtp.starttls.required=true, \
mail.smtp.ssl.protocols=TLSv1.2
Infor If you need to pass extra settings to the Java mail library. Comma separated, equals sign between the key and the value. This property is
matio optional.
nal
Note:
Prope mail.server.disabled
rty:
Infor An option is added to disable the mailserver. By default, this property is set to 'false'. By setting value to 'true', DSpace will not send out
matio emails. It will instead log the subject of the email which should have been sent. This is especially useful for development and test environments
nal where production data is used when testing functionality. This property is optional.
Note:
Prope mail.session.name
rty:
Infor Specifies the name of a javax.mail.Session object stored in JNDI under java:comp/env/mail. The default value is "Session".
matio
nal
Note:
Prope default.language
rty:
Infor If no other language is explicitly stated in the submission-forms.xml, the default language will be attributed to the metadata values. See also Mul
matio tilingual Support
nal
Note:
542
Prope mail.message.headers
rty:
Infor When processing a message template, setting a Velocity variable whose name is one of the values of this configuration property will add or
matio replace a message header of the same name, using the value of the variable as the header's value. See "Templates can set message
nal headers".
Note:
Prope mail.welcome.enabled
rty:
Infor Enable a "welcome letter" to the newly-registered user. By default this is false. See the welcome email template.
matio
nal
Note:
Each file is a Velocity template. You can use the full Velocity Template Language to help you customize messages. There are two Velocity variables pre-
defined by DSpace when processing an e-mail template:
params is the array of message parameters provided by the DSpace code which is sending the message. These are indexed by number,
starting at zero.
config is the table of DSpace configuration properties (such as dspace.name). These are looked up using config.get(property name).
Note: You should replace the contact-information "[email protected] or call us at xxx-555-xxxx" with your own contact details in:
config/emails/change_password
config/emails/register
name meaning
charset sets the charset parameter of the Content-Type: header of the bodypart, when there is a single bodypart. It also causes the subject
value to be treated as being encoded in this charset. If not set, the charset defaults to US-ASCII as specified in RFC 2046. If there are
multiple bodyparts, all are assumed to be encoded in US-ASCII and charset has no effect on them.
543
Sample message template
File Storage
Beginning with DSpace 6, your file storage location (aka bitstore) is now defined in the [dspace]/config/spring/api/bitstore.xml Spring
configuration file. By default it is defined as the [dspace]/assetstore/. More information on modifying your file storage location can be found at Conf
iguring the Bitstream Store in the Storage Layer documentation.
DSpace supports multiple options for storing your repository bitstreams (uploaded files). The files are not stored in the database, instead they are provided
via a configured "assetstore" or "bitstore".
By default, the assetstore is simply a directory on your server ([dspace]/assetstore/) under which bitstreams (files) are stored by DSpace.
At this time, DSpace supports two primary locations for storing your files:
1. Your local filesystem (used by default), specifically under the [dspace]/assetstore/ directory
2. OR, Amazon S3 (requires your own Amazon S3 account)
More information on configuring or customizing the storage location of your files can be found in the Storage Layer documentation.
Logging Configuration
Logging configuration has now moved to ${dspace.dir}/config/log4j2.xml
Proper log.init.config
ty:
Inform This is where your logging configuration file is located. You may override the default log4j configuration by providing your own. Existing
ational alternatives are:
Note:
log.init.config = ${dspace.dir}/config/log4j2.xml
log.init.config = ${dspace.dir}/config/log4j2-console.xml
Inform This is where to put the DSpace logs. The default setting (shown above) writes all DSpace logs to the ${dspace.dir}/log/ directory.
ational
Note:
Inform Log level for all DSpace-specific code (org.dspace.* packages). By default, DSpace only provides general INFO logs (in order to keep log
ational sizes reasonable). As necessary, you can temporarily change this setting to any of the following (ordered for most information to least): DEBUG
Note: , INFO, WARN, ERROR, FATAL
Please be aware we do not recommend running at the DEBUG level in Production for significant periods of time, as it will cause the logs to be
extremely large in size.
544
Exam loglevel.other = INFO
ple
value:
Inform Log level for other third-party tools/APIs used by DSpace (non-DSpace specific code). By default, DSpace only provides general INFO logs (in
ational order to keep log sizes reasonable). As necessary, you can temporarily change this setting to any of the following (ordered for most information
Note: to least): DEBUG, INFO, WARN, ERROR, FATAL
Please be aware we do not recommend running at the DEBUG level in Production for significant periods of time, as it will cause the logs to be
extremely large in size.
Property: plugin.classpath
Example /opt/dspace/plugins/aPlugin.jar:/opt/dspace/moreplugins
Value:
Information Search path for third-party plugin classes. This is a colon-separated list of directories and JAR files, each of which will be searched for
al Note: plugin classes after looking in all the places where DSpace classes are found. In this way you can designate one or more locations for
plugin files which will not be affected by DSpace upgrades.
Pr handle.canonical.prefix
op
ert
y:
Ex handle.canonical.prefix = http://hdl.handle.net/
a handle.canonical.prefix = ${dspace.ui.url}/handle/
m
ple
Va
lue
Inf Canonical Handle URL prefix. By default, DSpace is configured to use http://hdl.handle.net/ as the canonical URL prefix when generating dc.
or identifier.uri during submission, and in the 'identifier' displayed in item record pages. If you do not subscribe to CNRI's handle service, you
m can change this to match the persistent URL service you use, or you can force DSpace to use your site's URL, e.g. handle.canonical.
ati prefix = ${dspace.ui.url}/handle/. Note that this will not alter dc.identifer.uri metadata for existing items (only for subsequent
on submissions).
al
No
te:
Pr handle.prefix
op
ert
y:
Ex handle.prefix = 1234.56789
a
m
ple
Va
lue
545
Inf The default installed by DSpace is 123456789 but you will replace this upon receiving a handle from CNRI.
or
m
ati
on
al
No
te:
Pr handle.dir
op
ert
y:
Ex handle.dir = ${dspace.dir}/handle-server
a
m
ple
Va
lue:
Inf The default files, as shown in the Example Value is where DSpace will install the files used for the Handle Server.
or
m
ati
on
al
No
te:
Pr handle.additional.prefixes
op
erty
Inf List any additional prefixes that need to be managed by this handle server. For example, any handle prefixes that came from an external repository
or whose items have now been added to this DSpace. Multiple additional prefixes may be added in a comma separated list.
m
ati
on
al
No
te:
Authorization to execute the functions that are allowed to user with WRITE permission on an object will be attributed to be the ADMIN of the object (e.g.
community/collection/admin will be always allowed to edit metadata of the object). The default will be "true" for all the configurations.
Property: core.authorization.community-admin.create-subelement
Property: core.authorization.community-admin.delete-subelement
546
Community Administration: Policies and The group of administrators
Property: core.authorization.community-admin.policies
Informational Note: Authorization for a delegated community administrator to administrate the community
policies.
Property: core.authorization.community-admin.admin-group
Informational Note: Authorization for a delegated community administrator to edit the group of community
admins.
Property: core.authorization.community-admin.collection.policies
Informational Note: Authorization for a delegated community administrator to administrate the policies for
underlying collections.
Property: core.authorization.community-admin.collection.template-item
Informational Note: Authorization for a delegated community administrator to administrate the item template for
underlying collections.
Property: core.authorization.community-admin.collection.submitters
Informational Note: Authorization for a delegated community administrator to administrate the group of
submitters for underlying collections.
Property: core.authorization.community-admin.collection.workflows
Informational Note: Authorization for a delegated community administrator to administrate the workflows for
underlying collections.
Property: core.authorization.community-admin.collection.admin-group
Informational Note: Authorization for a delegated community administrator to administrate the group of
administrators for underlying collections.
Property: core.authorization.community-admin.item.delete
Informational Note: Authorization for a delegated community administrator to delete items in underlying
collections.
Property: core.authorization.community-admin.item.withdraw
Informational Note: Authorization for a delegated community administrator to withdraw items in underlying
collections.
Property: core.authorization.community-admin.item.reinstate
Informational Note: Authorization for a delegated community administrator to reinstate items in underlying
collections.
Property: core.authorization.community-admin.item.policies
547
Informational Note: Authorization for a delegated community administrator to administrate item policies in
underlying collections.
Community Administration: Bundles of Bitstreams, related to items owned by collections in the above Community
Property: core.authorization.community-admin.item.create-bitstream
Informational Note: Authorization for a delegated community administrator to create additional bitstreams in
items in underlying collections.
Property: core.authorization.community-admin.item.delete-bitstream
Informational Note: Authorization for a delegated community administrator to delete bitstreams from items in
underlying collections.
Property: core.authorization.community-admin.item.cc-license
Informational Note: Authorization for a delegated community administrator to administer licenses from items in
underlying collections.
Collection Administration:
The properties for collection administrators work similar core.authorization.collection-admin.policies
to those core.authorization.collection-admin.template-item
of community administrators, core.authorization.collection-admin.submitters
with respect to collection administration. core.authorization.collection-admin.workflows
core.authorization.collection-admin.admin-group
Collection Administration:
Item owned by the above Collection. The properties for core.authorization.collection-admin.item.delete
collection core.authorization.collection-admin.item.withdraw
administrators work similar to those of core.authorization.collection-admin.item.reinstatiate
community administrators, core.authorization.collection-admin.item.policies
with respect to administration of
items in underlying collections.
Collection Administration:
Bundles of bitstreams, related to items owned by core.authorization.collection-admin.item.create-bitstream
collections in the core.authorization.collection-admin.item.delete-bitstream
above Community. The properties for collection core.authorization.collection-admin.item-admin.cc-license
administrators
work similar to those of community administrators, with
respect to
administration of bitstreams related to items in
underlying collections.
Item Administration:
Bundles of bitstreams, related to items owned by core.authorization.item-admin.create-bitstream
collections in the core.authorization.item-admin.delete-bitstream
above Community. The properties for item core.authorization.item-admin.cc-license
administrators work
similar to those of community and collection
administrators,
with respect to administration of bitstreams
related to items in underlying collections.
Prop core.authorization.installitem.inheritance-read.append-mode
erty:
548
Exa core.authorization.installitem.inheritance-read.append-mode = false
mple
Valu
e:
Infor Determine if the DEFAULT READ policies of the collection should be always appended to the policies of the new item (property set to true) or
mati used only when no other READ policy has been defined in the submission process (property set to false). Please note that also in append mode
onal an open access default policy will be NOT inherited if other policies have been defined in the submission (i.e. if the item was restricted)
Note:
Login as feature
Proper webui.user.assumelogin
ty:
Inform Determine if super administrators (those whom are in the Administrators group) can login as another user from the "edit eperson" page. This is
ational useful for debugging problems in a running DSpace instance, especially in the workflow process. The default value is false, i.e., no one may
Note: assume the login of another user.
Property: harvest.includerestricted.rss
Informational When set to 'true' (default), items that haven't got the READ permission for the ANONYMOUS user, will be included in RSS feeds
Note: anyway.
Property: harvest.includerestricted.subscription
Informational When set to true (default), items that haven't got the READ permission for the ANONYMOUS user, will be included in Subscription
Note: emails anyway.
Proxy Settings
These settings for proxy are commented out by default. Uncomment and specify both properties if proxy server is required for external http requests. Use
regular host name without port number.
Prop http.proxy.host
erty:
Infor Enter the host name without the port number. Only currently used for Creative Commons licensing feature (to contact their API), and Sitemap
matio generation (to ping search server regarding updates)
nal
Note
Prop http.proxy.port
erty:
Infor Enter the port number for the proxy server. Only currently used for Creative Commons licensing feature (to contact their API), and Sitemap
matio generation (to ping search server regarding updates)
nal
Note
Prop useProxies
erty
549
Exa useProxies = true
mple
Valu
e:
Infor As of DSpace 7 (and above), this setting defaults to true. If "useProxies" is enabled, the authentication and statistics logging code will read the
matio X-Forwarded-For header in order to determine the correct client IP address.
nal
Note: As the User Interface uses Angular Universal (for SEO support), the proxy server that comes with Angular Universal is always enabled. By
default, only your local server (127.0.0.1) and the public IP address of `dspace.ui.url` are "trusted" as a proxy. If your DSpace instance is
protected by external proxy server, you may need to update the "proxies.trusted.ipranges" property below.
This also affects IPAuthentication, and should be enabled for that to work properly if your installation uses a proxy server.
Prop proxies.trusted.ipranges
erty
Infor By default, only proxies running on localhost (127.0.0.1) and the dspace.ui.url (public IP address) are "trusted". This allows our Angular
matio User Interface to communicate with the REST API via a trusted proxy, which is required for Angular Universal (for SEO support).
nal
Note: You can specify a range by only listing the first three ip-address blocks, e.g. 128.177.243 You can list multiple IP addresses or ranges by
comma-separating them.
Prop proxies.trusted.include_ui_ip
erty
Infor This setting specifies whether to automatically trust IP address of the dspace.ui.url as a proxy. By default, this is always set to true to
matio ensure the UI is fully trusted by the backend. However, if you are not using the Angular UI, you may choose to set this to "false" in order to only
nal trust proxies running on localhost (127.0.0.1) by default.
Note:
Prop server.forward-headers-strategy
erty
Infor This is a Spring Boot setting which may be overridden/specified in your local.cfg. By default, Spring Boot does not automatically use X-
matio Forwarded-* Headers when generating links (and similar) in the REST API. When using a proxy in front of the REST API, you may need to
nal modify this setting:
Note:
NATIVE = allows your web server to natively support standard Forwarded headers
FRAMEWORK = enables Spring Framework's built in filter to manage these headers in Spring Boot. (This value may be useful to set for
DSpace if you find that X-Forwarded headers are not working)
NONE = default value. Forwarded headers are ignored
Media Filters are configured as Named Plugins, with each filter also having a separate configuration setting (in dspace.cfg) indicating which formats it can
process. The default configuration is shown below.
Prope filter.plugins
rty:
550
Exam
ple filter.plugins = PDF Text Extractor
Value: filter.plugins = Html Text Extractor
filter.plugins = Word Text Extractor
filter.plugins = JPEG Thumbnail
Inform This setting lists the names of all enabled MediaFilter or FormatFilter plugins. To enable multiple plugins, list them on separate lines (as shown
ationa above) or provide a comma separated list.
l
Note:
Prope plugin.named.org.dspace.app.mediafilter.FormatFilter
rty:
Exam
ple plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.PDFFilter = PDF Text
Value: Extractor
plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.HTMLFilter = HTML
Text Extractor
plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.WordFilter = Word
Text Extractor
plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.JPEGFilter = JPEG
Thumbnail
plugin.named.org.dspace.app.mediafilter.FormatFilter = org.dspace.app.mediafilter.
BrandedPreviewJPEGFilter = Branded Preview JPEG
Inform Assign "human-understandable" names to each filter. These names are used to enable/disable plugins using "filter.plugins" setting above. As
ationa with the previous setting, multiple plugins can be listed here on separate lines (or comma separated)
l
Note:
Prope
rty: filter.org.dspace.app.mediafilter.PDFFilter.inputFormats
filter.org.dspace.app.mediafilter.HTMLFilter.inputFormats
filter.org.dspace.app.mediafilter.WordFilter.inputFormats
filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats
filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats
Exam
ple filter.org.dspace.app.mediafilter.PDFFilter.inputFormats = Adobe PDF
Value: filter.org.dspace.app.mediafilter.HTMLFilter.inputFormats = HTML, Text
filter.org.dspace.app.mediafilter.WordFilter.inputFormats = Microsoft Word
filter.org.dspace.app.mediafilter.JPEGFilter.inputFormats = BMP, GIF, JPEG, \
image/png
filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats = BMP, \
GIF, JPEG, image/png
Inform Configure each filter's input format(s). These must match format names in the DSpace file format registry.
ationa
l
Note:
Prope filter.org.dspace.app.mediafilter.publicPermission
rty:
Optionally, configure filter(s) which should always create publicly accessible bitstreams (e.g. useful if you want thumbnails to always be publicly
accessible). By default, any bitstreams created by a filter will inherit the same permissions as the original file (e.g. if original image is access
restricted, then thumbnail will also be access restricted by default).
Prope pdffilter.largepdfs
rty:
551
Exam pdffilter.largepdfs = true
ple
Value:
Inform It this value is set for "true", all PDF extractions are written to temp files as they are indexed. This is slower, but helps to ensure that PDFBox
ationa software DSpace uses does not eat up all your memory.
l
Note:
Prope pdffilter.skiponmemoryexception
rty:
Inform If this value is set for "true", PDFs which still result in an "Out of Memory" error from PDFBox are skipped over. These problematic PDFs will
ationa never be indexed until memory usage can be decreased in the PDFBox software.
l
Note:
Names are assigned to each filter using the plugin.named.org.dspace.app.mediafilter.FormatFilter field (e.g. by default the PDFilter is
named "PDF Text Extractor".
Finally, the appropriate filter.<class path>.inputFormats defines the valid input formats which each filter can be applied. These format names m
ust match the short description field of the Bitstream Format Registry.
You can also implement more dynamic or configurable Media/Format Filters which extend SelfNamedPlugin .
For more information on Media/Format Filters, see the section on Mediafilters for Transforming DSpace Content.
For more information on using Packagers and Crosswalks, see the section on Importing and Exporting Content via Packages.
The value of this property is a path to a separate properties file containing the configuration for this crosswalk. The pathname is relative to the DSpace
configuration directory, i.e. the config subdirectory of the DSpace install directory. Example from the dspace.cfg file:
Properties: crosswalk.mods.properties.MODS
crosswalk.mods.properties.mods
Information This defines a crosswalk named MODS whose configuration comes from the file [dspace]/config/crosswalks/mods.properties
al Note: . (In the above example, the lower-case name was added for OAI-PMH)
The MODS crosswalk properties file is a list of properties describing how DSpace metadata elements are to be turned into elements of the MODS XML
output document. The property name is a concatenation of the metadata schema, element name, and optionally the qualifier. For example, the contributor.
author element in the native Dublin Core schema would be: dc.contributor.author. The value of the property is a line containing two segments separated by
the vertical bar ("|"_): The first part is an XML fragment which is copied into the output document. The second is an XPath expression describing where in
that fragment to put the value of the metadata element. For example, in this property:
dc.contributor.author = <mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>%s</mods:namePart>
</mods:name>
Some of the examples include the string "%s" in the prototype XML where the text value is to be inserted, but don't pay any attention to it, it is an artifact
that the crosswalk ignores. For example, given an author named Jack Florey, the crosswalk will insert
552
<mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>Jack Florey</mods:namePart>
</mods:name>
into the output document. Read the example configuration file for more details.
XSLT-based Crosswalks
The XSLT crosswalks use XSL stylesheet transformation (XSLT) to transform an XML-based external metadata format to or from DSpace's internal
metadata. XSLT crosswalks are much more powerful and flexible than the configurable MODS and QDC crosswalks, but they demand some esoteric
knowledge (XSL stylesheets). Given that, you can create all the crosswalks you need just by adding stylesheets and configuration lines, without touching
any of the Java code.
Properties: crosswalk.submission.MODS.stylesheet
As shown above, there are three (3) parts that make up the properties "key":
crosswalk.submission.PluginName.stylesheet =
1 2 3 4
You can make two different plugin names point to the same crosswalk, by adding two configuration entries with the same path:
crosswalk.submission.MyFormat.stylesheet = crosswalks/myformat.xslt
crosswalk.submission.almost_DC.stylesheet = crosswalks/myformat.xslt
The dissemination crosswalk must also be configured with an XML Namespace (including prefix and URI) and an XML schema for its output format. This is
configured on additional properties in the DSpace configuration:
crosswalk.dissemination.PluginName.namespace.Prefix = namespace-URI
crosswalk.dissemination.PluginName.schemaLocation = schemaLocation value
For example:
crosswalk.dissemination.qdc.namespace.dc = http://purl.org/dc/elements/1.1/
crosswalk.dissemination.qdc.namespace.dcterms = http://purl.org/dc/terms/
crosswalk.dissemination.qdc.schemalocation = http://purl.org/dc/elements/1.1/ \
http://dublincore.org/schemas/xmls/qdc/2003/04/02/qualifieddc.xsd
If you remove all XSLTDisseminationCrosswalks please disable the XSLTDisseminationCrosswalk in the list of selfnamed plugins. If no
XSLTDisseminationCrosswalks are configured but the plugin is loaded the PluginManager will log an error message ("Self-named plugin class "org.dspace.
content.crosswalk.XSLTDisseminationCrosswalk" returned null or empty name list!").
553
[dspace]/bin/dspace dsrun org.dspace.content.crosswalk.XSLTDisseminationCrosswalk <plugin name> <handle>
[output-file]
For example, you can test the marc plugin on the handle 123456789/3 with:
Informations from the script will be printed to stderr while the XML output of the dissemination crosswalk will be printed to stdout. You can give a third
parameter containing a filename to write the output into a file, but be careful: the file will be overwritten if it exists. When you are working on
XSLTCrosswalks it is very helpful to get the original XML on which the XSLT processor works. Use the crosswalk dim to get the original XML:
Testing a submission crosswalk works quite the same way. Use the following command-line utility, it calls the crosswalk plugin to translate an XML
document you submit, and displays the resulting intermediate XML (DIM). Invoke it with:
[dspace]/bin/dspace dsrun
org.dspace.content.crosswalk.XSLTIngestionCrosswalk [-l] <plugin name> <input-file>
where <plugin name> is the name of the crosswalk plugin to test (e.g. "LOM"), and <input-file> is a file containing an XML document of metadata in the
appropriate format.
Add the -l option to pass the ingestion crosswalk a list of elements instead of a whole document, as if the List form of the ingest() method had been
called. This is needed to test ingesters for formats like DC that get called with lists of elements instead of a root element.
Properties: crosswalk.qdc.namspace.qdc.dc
Properties: crosswalk.qdc.namspace.qdc.dcterms
Properties: crosswalk.qdc.schemaLocation.QDC
Example Value:
crosswalk.qdc.schemaLocation.QDC = http://www.purl.org/dc/terms \
http://dublincore.org/schemas/xmls/qdc/2006/01/06/dcterms.xsd \
http://purl.org/dc/elements/1.1 \
http://dublincore.org/schemas/xmls/qdc/2006/01/06/dc.xsd
Properties: crosswalk.qdc.properties.QDC
Informational Configuration of the QDC Crosswalk dissemination plugin for Qualified DC. (Add lower-case name for OAI-PMH. That is, change
Note: QDC to qdc.)}}
In the property key "crosswalk.qdc.properties.QDC" the value of this property is a path to a separate properties file containing the configuration for
this crosswalk. The pathname is relative to the DSpace configuration directory /[dspace]/config . Referring back to the "Example Value" for this
property key, one has crosswalks/qdc.properties which defines a crosswalk named QDC whose configuration comes from the file [dspace]
/config/crosswalks/qdc.properties .
554
You will also need to configure the namespaces and schema location strings for the XML output generated by this crosswalk. The namespaces properties
names are formatted:
crosswalk.qdc.namespace.prefix = uri
where prefix is the namespace prefix and uri is the namespace URI. See the above Property and Example Value keys as the default dspace.cfg has been
configured.
The QDC crosswalk properties file is a list of properties describing how DSpace metadata elements are to be turned into elements of the Qualified DC
XML output document. The property name is a concatenation of the metadata schema, element name, and optionally the qualifier. For example, the contr
ibutor.author element in the native Dublin Core schema would be: dc.contributor.author . The value of the property is an XML fragment, the
element whose value will be set to the value of the metadata field in the property key.
the generated XML in the output document would look like, e.g.:
<dcterms:temporal>Fall, 2005</dcterms:temporal>
You can add names for existing crosswalks, add new plugin classes, and add new configurations for the configurable crosswalks as noted below.
You can add names for the existing plugins, and add new plugins, by altering these configuration properties. See the Plugin Manager architecture for more
information about plugins.
Property: event.dispatcher.default.class
Informational This is the default synchronous dispatcher (Same behavior as traditional DSpace).
Note:
Property: event.dispatcher.default.consumers
Informational This is the default synchronous dispatcher (Same behavior as traditional DSpace).
Note:
Property: event.dispatcher.noindex.class
Informational The noindex dispatcher will not create search or browse indexes (useful for batch item imports).
Note:
Property: event.dispatcher.noindex.consumers
Informational The noindex dispatcher will not create search or browse indexes (useful for batch item imports).
Note:
Property: event.consumer.discovery.class
555
Property: event.consumer.discovery.filters
Property: event.consumer.eperson.class
Property: event.consumer.eperson.filters
Property: event.consumer.test.class
Informational Test consumer for debugging and monitoring. Commented out by default.
Note:
Property: event.consumer.test.filters
Informational Test consumer for debugging and monitoring. Commented out by default.
Note:
Property: testConsumer.verbose
Informational Set this to true to enable testConsumer messages to standard output. Commented out by default.
Note:
Embargo
DSpace embargoes utilize standard metadata fields to hold both the "terms" and the "lift date". Which fields you use are configurable, and no specific
metadata element is dedicated or predefined for use in embargo. Rather, you specify exactly what field you want the embargo system to examine when it
needs to find the terms or assign the lift date.
Propert embargo.field.terms
y:
Informa Embargo terms will be stored in the item metadata. This property determines in which metadata field these terms will be stored. An example
tional could be dc.embargo.terms
Note:
Propert embargo.field.lift
y:
Informa The Embargo lift date will be stored in the item metadata. This property determines in which metadata field the computed embargo lift date
tional will be stored. You may need to create a DC metadata field in your Metadata Format Registry if it does not already exist. An example could be
Note: dc.embargo.liftdate
Propert embargo.terms.open
y:
556
Informa You can determine your own values for the embargo.field.terms property (see above). This property determines what the string value will be
tional for indefinite embargos. The string in terms field to indicate indefinite embargo.
Note:
Propert plugin.single.org.dspace.embargo.EmbargoSetter
y:
Informa To implement the business logic to set your embargos, you need to override the EmbargoSetter class. If you use the value
tional DefaultEmbargoSetter, the default implementation will be used.
Note:
Propert plugin.single.org.dspace.embargo.EmbargoLifter
y:
Informa To implement the business logic to lift your embargos, you need to override the EmbargoLifter class. If you use the value
tional DefaultEmbargoLifter, the default implementation will be used.
Note:
More Embargo Details
More details on Embargo configuration, including specific examples can be found in the Embargo section of the documentation.
Propert plugin.single.org.dspace.checker.BitstreamDispatcher
y:
Propert checker.retention.default
y:
Informa This option specifies the default time frame after which all checksum checks are removed from the database (defaults to 10 years). This
tional means that after 10 years, all successful or unsuccessful matches are removed from the database.
Note:
Propert checker.retention.CHECKSUM_MATCH
y:
Exampl checker.retention.CHECKSUM_MATCH = 8w
e
Value:
Informa This option specifies the time frame after which a successful match will be removed from your DSpace database (defaults to 8 weeks). This
tional means that after 8 weeks, all successful matches are automatically deleted from your database (in order to keep that database table from
Note: growing too large).
More Checksum Checking Details
For more information on using DSpace's built-in Checksum verification system, see the section on Validating CheckSums of Bitstreams.
557
It is possible for an authorized user to request a complete export and download of a DSpace item in a compressed zip file. This zip file may contain the
following:
dublin_core.xml
license.txt
contents (listing of the contents)
handle file itself and the extract file if available
Property: org.dspace.app.itemexport.work.dir
Informati The directory where the exports will be done and compressed.
onal
Note:
Property: org.dspace.app.itemexport.download.dir
Informati The directory where the compressed files will reside and be read by the downloader.
onal Note
Property: org.dspace.app.itemexport.life.span.hours
Example org.dspace.app.itemexport.life.span.hours = 48
Value:
Informati The length of time in hours each archive should live for. When new archives are created this entry is used to delete old ones.
onal Note
Property: org.dspace.app.itemexport.max.size
Informati The maximum size in Megabytes (Mb) that the export should be. This is enforced before the compression. Each bitstream's size in each
onal Note item being exported is added up, if their cumulative sizes are more than this entry the export is not kicked off.
Subscription Emails
DSpace, through some advanced installation and setup, is able to send out an email to collections that a user has subscribed. The user who is subscribed
to a collection is emailed each time an item id added or modified. The following property key controls whether or not a user should be notified of a
modification.
Property: eperson.subscription.onlynew
Informational For backwards compatibility, the subscription emails by default include any modified items. The property key is COMMENTED OUT
Note: by default.
Hiding Metadata
It is possible to hide metadata from public consumption, so that it's only available to users with WRITE permissions on the Item. (NOTE: Prior to 7.6.1,
Administrator privileges were required for hidden metadata. This was modified to allow users to submit hidden metadata fields, as well as allow Community
/Collection Admins to see hidden metadata.)
Proper metadata.hide.dc.description.provenance
ty:
558
Inform Hides the metadata in the property key above except to the administrator. Fields named here are hidden in the following places UNLESS the
ational logged-in user has WRITE permissions on the Item:
Note:
1. REST API (and therefore the User Interface)
2. RDF (everywhere as there is currently no possibility to authenticate)
3. OAI-PMH server (everywhere as there is currently no possibility to authenticate)
To designate a field as hidden, add a property here in the form: metadata.hide.SCHEMA.ELEMENT.QUALIFIER = true. This default
configuration hides the dc.description.provenance field, since that usually contains email addresses which ought to be kept private and
is mainly of interest to administrators.
Property: webui.submit.upload.required
Information Whether or not a file is required to be uploaded during the "Upload" step in the submission process. The default is true. If set to "false",
al Note: then the submitter (human being) has the option to skip the uploading of a file.
Property: sherpa.romeo.url
Property: sherpa.romeo.apikey
Informational Note: Allow to use a specific API key to raise the usage limit (500 calls/day for unregistred user).
The functionality rely on understanding to which Journal (ISSN) is related the submitting item. This is done out of box looking to some item metadata but a
different strategy can be used as for example look to a metadata authority in the case that the Sherpa/RoMEO autocomplete for Journal is used (see Autho
rityControlSettings)
The strategy used to discover the Journal related to the submission item is defined in the spring file /config/spring/api/sherpa.xml
<bean class="org.dspace.app.sherpa.submit.SHERPASubmitConfigurationService"
id="org.dspace.app.sherpa.submit.SHERPASubmitConfigurationService">
<property name="issnItemExtractors">
<list>
<bean class="org.dspace.app.sherpa.submit.MetadataValueISSNExtractor">
<property name="metadataList">
<list>
<value>dc.identifier.issn</value>
</list>
</property>
</bean>
<!-- Use the follow if you have the SHERPARoMEOJournalTitle enabled
<bean class="org.dspace.app.sherpa.submit.MetadataAuthorityISSNExtractor">
<property name="metadataList">
<list>
<value>dc.title.alternative</value>
</list>
</property>
</bean> -->
</list>
</property>
</bean>
559
Configuring Creative Commons License
The following configurations are for the Creative Commons license step in the submission process. Submitters are given an opportunity to select a
Creative Common license to accompany the item. Creative Commons licenses govern the use of the content. For further details, refer to the Creative
Commons website at http://creativecommons.org .
Creative Commons licensing is optionally available and may be configured for any given collection that has a defined submission sequence, or be part of
the "default" submission process. This process is described in the Submission User Interface section of this manual. There is a Creative Commons step
already defined, but it is commented out, so enabling Creative Commons licensing is typically just a matter of uncommenting that step.
When enabled, the Creative Commons public API is utilized. This allows DSpace to store metadata references to the selected CC license, while also
storing the CC License as a bitstream. The following CC License information are captured:
The URL of the CC License is stored in the "dc.rights.uri" metadata field (or whatever field is configured in the "cc.license.uri" setting below)
The name of the CC License is stored in the "dc.rights" metadata field (or whatever field is configured in the "cc.license.name" setting below).
This only occurs if "cc.submit.setname=true" (default value)
The RDF version of the CC License is stored in a bitstream named "license_rdf" in the CC-LICENSE bundle (as long as "cc.submit.
addbitstream=true", which is the default value)
The following configurations (in dspace.cfg) relate to the Creative Commons license process:
Propert cc.api.rooturl
y:
Informa Generally will never have to assign a different value - this is the base URL of the Creative Commons service API.
tional
Note:
Propert cc.license.uri
y:
Informa The field that holds the Creative Commons license URI.
tional
Note:
Propert cc.license.name
y:
Informa The field that holds the Creative Commons license Name.
tional
Note:
Propert cc.submit.setname
y:
Informa If true, the license assignment will add the field configured with the "cc.license.name" with the name of the CC license; if false, only "cc.license.
tional uri" field is added.
Note:
Propert cc.submit.addbitstream
y:
Informa If true, the license assignment will add a bitstream with the CC license RDF; if false, only metadata field(s) are added.
tional
Note:
Propert cc.license.classfilter
y:
560
Exampl cc.license.classfilter = recombo,mark
e
Value:
Informa This list defines the values that will be excluded from the license (class) selection list, as defined by the web service at the URL: http://api.
tional creativecommons.org/rest/1.5/classes
Note:
Propert cc.license.jurisdiction
y:
Exampl cc.license.jurisdiction = nz
e
Value:
Informa Should a jurisdiction be used? If so, which one? See http://creativecommons.org/international/ for a list of possible codes (e.g. nz = New
tional Zealand, uk = England and Wales, jp = Japan)
Note:
Commenting out this field will cause DSpace to select the latest, unported CC license (currently version 4.0). However, as Creative Commons
4.0 does not provide jurisdiction specific licenses, if you specify this setting, your DSpace will continue to use older, Creative Commons 3.0
jurisdiction licenses.
Property cc.license.locale
Exampl cc.license.locale = en
e
Value:
Informa Locale to be used (in the form: language or language_country), e.g. "en" or "en_US"
tional If no default locale is defined the Creative Commons default locale will be used.
Note:
Property: webui.licence_bundle.show
Informational Sets whether to display the contents of the license bundle (often just the deposit license in the standard DSpace installation). UNSUP
Note: PORTED in DSpace 7.x
Property: thumbnail.maxwidth
Informational This property sets the maximum width of generated thumbnails that are being displayed on item pages.
Note:
Property: thumbnail.maxheight
Informational This property sets the maximum height of generated thumbnails that are being displayed on item pages.
Note:
Property: webui.preview.maxwidth
Informational This property sets the maximum width for the preview image. Only used for BrandedPreviewJPEGFilter
Note:
Property: webui.preview.maxheight
Informational This property sets the maximum height for the preview image. Only used for BrandedPreviewJPEGFilter
Note:
Property: webui.preview.brand
Informational This is the brand text that will appear with the image. Only used for BrandedPreviewJPEGFilter
Note:
561
Property: webui.preview.brand.abbrev
Informational An abbreviated form of the full Branded Name. This will be used when the preview image cannot fit the normal text. Only used for
Note: BrandedPreviewJPEGFilter
Property: webui.preview.brand.height
Informational The height (in px) of the brand. Only used for BrandedPreviewJPEGFilter
Note:
Property: webui.preview.brand.font
Informational This property sets the font for your Brand text that appears with the image. Only used for BrandedPreviewJPEGFilter
Note:
Property: webui.preview.brand.fontpoint
Informational This property sets the font point (size) for your Brand text that appears with the image. Only used for BrandedPreviewJPEGFilter
Note:
Property: webui.preview.dc
Informational The Dublin Core field that will display along with the preview. This field is optional. Only used for BrandedPreviewJPEGFilter
Note:
Optionally, you can enable item counts to be displayed in the user interface for every Community and Collection. This uses the same configuration that
was in place for DSpace 6 and earlier.
Prope webui.strengths.show
rty:
Inform When "true" this will display the count of archived items (in the User Interface's browse screens). By default this is "false" (disabled). When
ationa enabled, the counts may be counted in real time, or fetched from the cache (see next option).
l Note:
Prope webui.strengths.cache
rty:
Inform When showing the strengths (i.e. item counts), should they be counted in real time, or fetched from the cache. Counts fetched in real time will
ationa perform an actual count of the index contents every time a page with this feature is requested, which may not scale. If you set the property key
l Note: is set to cache ("true"), the counts will be cached on first load.
Property: webui.browse.index.<n>
562
Informational Note: This is an example of how one "Defines the Indexes". See "Defining the Indexes" in the next sub-section.
Property: webui.itemlist.sort-option.<n>
Informational Note: This is an example of how one "Defines the Sort Options". See "Defining Sort Options" in the following sub-section.
SOLR Browse Engine (SOLR DAOs), default since DSpace 4.0 - This enables Apache Solr to be utilized as a backend for all browsing of
DSpace. This option requires that you have Discovery (Solr search/browse engine) enabled in your DSpace.
Property: browseDAO.class
Information This property configures the Java class that is used for READ operations by the Browse System. You need to have Discovery enabled
al Note: (this is the default since DSpace 4.0) to use the Solr Browse DAOs
DSpace comes with four default indexes pre-defined: author, title, date issued, and subjects. Users may also define additional indexes or re-configure the
current indexes for different levels of specificity. For example, the default entries that appear in the dspace.cfg as default installation:
webui.browse.index.1 = dateissued:item:dateissued
webui.browse.index.2 = author:metadata:dc.contributor.*\,dc.creator:text
webui.browse.index.3 = title:item:title
webui.browse.index.4 = subject:metadata:dc.subject.*:text
#webui.browse.index.5 = dateaccessioned:item:dateaccessioned
There are two types of indexes which are provided in this default integration:
"item" indexes which have a format of webui.browse.index.<n> = <index-name> : item : <sort-type> : (asc | desc)
"metadata" indexes which have a format of webui.browse.index.<n> = <index-name> : metadata : <comma-separated-list-
of-metadata-fields> : (date | text) : (asc | dec) : <sort-type>
Please notice that the punctuation is paramount in typing this property key in the dspace.cfg file. The following table explains each element:
webui. n is the index number. The index numbers must start from 1 and increment continuously by 1 thereafter. Deviation from this will cause an
browse. error during install or a configuration update. So anytime you add a new browse index, remember to increase the number. (Commented
index. out index numbers may be used over again).
<n>
<index- The name by which the index will be identified. In order for the DSpace UI to display human-friendly description for this index, you'll need
name> to update the UI's language packs (e.g. src/assets/i18n/en.json5) to include a key using this index name, for example:
browse.metadata.<index-name> = "MyField",
browse.metadata.<index-name>.breadcrumbs = "Browse by MyField",
563
<schema- (Only for "metadata" indexes) The schema used for the field to be index. First part of a metadata field name. The default is dc (for Dublin
prefix> Core).
<element> (Only for "metadata" indexes) The schema element. Second part of a metadata field name. In Dublin Core, for example, the author
element is referred to as "Contributor". The user should consult the default Dublin Core Metadata Registry table in Appendix A.
<qualifi (Only for "metadata" indexes) This is the qualifier to the <element> component. Third part of a metadata field name. The user has two
er> choices: an asterisk "*" or a proper qualifier of the element. The asterisk is a wildcard and causes DSpace to index all types of the schema
element. For example, if you have the element "contributor" and the qualifier "*" then you would index all contributor data regardless of the
qualifier. Another example, you have the element "subject" and the qualifier "lcsh" would cause the indexing of only those fields that have
the qualifier "lcsh". (This means you would only index Library of Congress Subject Headings and not all data elements that are subjects.)
<sort- (Optional, should be set for "item" indexes) This refers to the sort type / data type of the field:
type>
date the index type will be treated as a date object and sorted as such
text the index type will be treated as plain text and sorted as such
(any other value refers to a custom <sort-type> which should be defined in a corresponding webui.itemlist.sort-option.<n>
setting. See Defining Sort Options below for more information.)
<sort- (Optional) The default sort order. Choose asc (ascending) or desc (descending). Ascending is the default value, but descending may be
order> useful for date-based indexes (e.g. to display most recent submissions first)
Sort options/types will be available when browsing a list of items (either on "item" index type above or after selecting a specific value for "metadata"
indexes). You can define an arbitrary number of fields to sort on. For example, the default entries that appear in the dspace.cfg as default installation:
webui.itemlist.sort-option.1 = title:dc.title:title
webui.itemlist.sort-option.2 = dateissued:dc.date.issued:date
webui.itemlist.sort-option.3 = dateaccessioned:dc.date.accessioned:date
<sort-type-name> The name by which the sort option will be identified. This is the name by which it is referred in the "webui.browse.index"
settings (see Defining the Indexes).
<schema-prefix> The schema used for the field to be sorted on in the index. The default is dc (for Dublin Core).
<element> The schema element. In Dublin Core, for example, the author element is referred to as "Contributor". The user should
consult the default Dublin Core Metadata Registry table in Appendix A.
<qualifier> This is the qualifier to the <element> component. The user has two choices: an asterisk "*" or a proper qualifier of the
element.
Please note that when using another vocabulary, the UI's language packs (e.g. src/assets/i18n/en.json5) will need to be updated as well, e.g.:
Starting with DSpace 7.6.1, these Hierarchical "Browse By" options can be disabled via the below configuration:
Prop webui.browse.vocabularies.disabled
erty:
564
Exa webui.browse.vocabularies.disabled = srsc
mple
Valu
e:
Infor By default, all controlled vocabularies used within your submission forms (submission-forms.xml) will be enabled in the Browse By menu of the
mati User Interface. If you wish to disable any from display in the UI, you can list them in this configuration. Multiple values can be comma separated
onal (or this config can be repeated). Changes to this configuration will not take effect until your servlet engine (e.g. Tomcat) is restarted.
Note:
Info This enable/disable the show of frequencies (count) in metadata browse <n> refers to the browse configuration. As default frequencies are
rma shown for all metadata browse
tion
al
Not
e:
Pro plugin.named.org.dspace.sort.OrderFormatDelegate
pert
y:
Exa
mpl plugin.named.org.dspace.sort.OrderFormatDelegate = \
e org.dspace.sort.OrderFormatTitleMarc21=title
Val
ue:
Info This sets the option for how the indexes are sorted. All sort normalizations are carried out by the OrderFormatDelegate. The plugin manager can
rma be used to specify your own delegates for each datatype. The default datatypes (and delegates) are:
tion
al
author = org.dspace.sort.OrderFormatAuthor
Not
e: title = org.dspace.sort.OrderFormatTitle
text = org.dspace.sort.OrderFormatText
If you redefine a default datatype here, the configuration will be used in preferences to the default. However, if you do not explicitly redefine a
datatype, then the default will still be used in addition to the datatypes you do specify. As of DSpace release 1.5.2, the multi-lingual MARC21 title
ordering is configured as default, as shown in the example above. To use the previous title ordering (before release 1.5.2), comment out the
configuration in your dspace.cfg file.
Property: webui.browse.index.<n>
Informational Note:
Tag cloud
Apart from the single (type=metadata) and full (type=item) browse pages, tag cloud is a new way to display the unique values of a metadata field.
To enable “tag cloud” browsing for a specific index you need to declare it in the dspace.cfg configuration file using the following option:
Property: webui.browse.index.tagcloud.<n>
565
Informational Enable/Disable tag cloud in browsing for a specific index. ‘n’ is the index number of the specific index which needs to be of type
Note: ‘metadata’.
You do not have to re-index discovery when you change this configuration
The appearance configuration for the tag cloud is located in the Discovery xml configuration file (dspace/config/spring/api/discovery.xml). Without
configuring the appearance, the default one will be applied to the tag cloud
In this file, there must be a bean named “browseTagCloudConfiguration” of class “org.dspace.discovery.configuration.TagCloudConfiguration”. This bean
can have any of the following properties. If some is missing, the default value will be applied.
displayScore Should display the score of each tag next to it? Default: false
shouldCenter Should display the tag as center aligned in the page or left aligned? Possible values: true | false. Default: true
totalTags How many tags will be shown. Value -1 means all of them. Default: -1
Default: Case.PRESERVE_CASE
randomColo If the 3 css classes of the tag cloud should be independent of score (random=yes) or based on the score. Possible values: true | false .
rs Default: true
fontFrom The font size (in em) for the tag with the lowest score. Possible values: any decimal. Default: 1.1
fontTo The font size (in em) for the tag with the lowest score. Possible values: any decimal. Default: 3.2
cuttingLevel The score that tags with lower than that will not appear in the rag cloud. Possible values: any integer from 1 to infinity. Default: 0
ordering The ordering of the tags (based either on the name or the score of the tag)
Default: Tag.GreekNameComparatorAsc
When tagCloud is rendered there are some CSS classes that you can change in order to change the tagcloud appearance.
Class Note
Prop webui.browse.link.<n>
erty:
566
Infor This is used to configure which fields should link to other browse listings. This should be associated with the name of one of the browse indexes (
mati webui.browse.index.n) with a metadata field listed in webui.itemlist.columns above. If this condition is not fulfilled, cross-linking will
onal not work. Note also that crosslinking only works for metadata fields not tagged as title in webui.itemlist.columns.
Note:
The format of the property key is webui.browse.link.<n> = <index name>:<display column metadata> Please notice the punctuation used between the
elements.
<index name> This need to match your entry for the index name from webui.browse.index property key.
webui.browse.link.1 = author:dc.contributor.*
Creates a link for all types of contributors (authors, editors,
illustrators, others, etc.)
webui.browse.link.2 = subject:dc.subject.lcsh
Creates a link to subjects that are Library of Congress only.
In this case, you have a browse index that contains only LC
Subject Headings
webui.browse.link.3 = series:dc.relation.ispartofseries
Creates a link for the browse index "Series". Please note this
is again, a customized browse index and not part of the
DSpace distributed release.
Property:
plugin.named.org.dspace.content.license.
LicenseArgumentFormatter
Example Value:
plugin.named.org.dspace.content.license.LicenseArgumentFormatter = \
org.dspace.content.license.SimpleDSpaceObjectLicenseFormatter = collection, \
org.dspace.content.license.SimpleDSpaceObjectLicenseFormatter = item, \
org.dspace.content.license.SimpleDSpaceObjectLicenseFormatter = eperson
Informational It is possible include contextual information in the submission license using substitution variables. The text substitution is driven by a
Note: plugin implementation.
Please note that Syndication (RSS/Atom) feeds require that OpenSearch is enabled to function. When enabled, a syndication feed will be available on the
DSpace homepage (for entire site), and on each community/collection homepage (specific to that community/collection). Because Syndication Feeds use
OpenSearch, all OpenSearch settings also apply to Syndication Feeds.
Pro websvc.opensearch.enable
pert
y:
Info By default, OpenSearch & Syndication feeds are set to true (on) . Change key to "false" to disable. NOTE this setting affects OpenSearch
rma Support as well
tion
al
Not
e:
567
Pro webui.feed.localresolve
pert
y:
Info By default, (set to false), URLs returned by the feed will point at the global handle resolver (e.g. http://hdl.handle.net/123456789/1). If set to true th
rma e local server URLs are used (e.g. http://myserver.myorg/handle/123456789/1).
tion
al
Not
e:
Pro webui.feed.item.title
pert
y:
Info This property customizes each single-value field displayed in the feed information for each item. Each of the fields takes a single metadata field.
rma The form of the key is <scheme prefix>.<element>.<qualifier> In place of the qualifier, one may leave it blank to exclude any qualifiers or use the
tion wildcard "*" to include all qualifiers for a particular element.
al
Not
e:
Pro webui.feed.item.date
pert
y:
Info This property customizes each single-value field displayed in the feed information for each item. Each of the fields takes a single metadata field.
rma The form of the key is <scheme prefix>.<element>.<qualifier> In place of the qualifier, one may leave it blank to exclude any qualifiers or use the
tion wildcard "*" to include all qualifiers for a particular element.
al
Not
e:
Pro webui.feed.item.description
pert
y:
Exa
mpl webui.feed.item.description = dc.title, dc.contributor.author, \
e dc.contributor.editor, dc.description.abstract, \
Val dc.description
ue:
Info One can customize the metadata fields to show in the feed for each item's description. Elements are displayed in the order they are specified in ds
rma pace.cfg.Like other property keys, the format of this property key is: webui.feed.item.description = <scheme prefix>.<element>.<qualifier>. In
tion place of the qualifier, one may leave it blank to exclude any qualifiers or use the wildcard "*" to include all qualifiers for a particular element.
al
Not
e:
Pro webui.feed.item.author
pert
y:
568
Info The name of field to use for authors (Atom only); repeatable.
rma
tion
al
Not
e:
Pro webui.feed.logo.url
pert
y:
Info Customize the image icon included with the site-wide feeds. This must be an absolute URL.
rma
tion
al
Not
e:
Pro webui.feed.item.dc.creator
pert
y:
Info This optional property adds structured DC elements as XML elements to the feed description. They are not the same thing as, for example, webui.
rma feed.item.description. Useful when a program or stylesheet will be transforming a feed and wants separate author, description, date, etc.
tion
al
Not
e:
Pro webui.feed.item.dc.date
pert
y:
Info This optional property adds structured DC elements as XML elements to the feed description. They are not the same thing as, for example, webui.
rma feed.item.description. Useful when a program or stylesheet will be transforming a feed and wants separate author, description, date, etc.
tion
al
Not
e:
Pro webui.feed.item.dc.description
pert
y:
Info This optional property adds structured DC elements as XML elements to the feed description. They are not the same thing as, for example, webui.
rma feed.item.description. Useful when a program or stylesheet will be transforming a feed and wants separate author, description, date, etc.
tion
al
Not
e:
Pro webui.feed.podcast.collections
pert
y:
569
Exa webui.feed.podcast.collections = 1811/45183,1811/47223
mpl
e
Val
ue:
Info This optional property enables Podcast Support on the RSS feed for the specified collection handles. The podcast is iTunes compatible and will
rma expose the bitstreams in the items for viewing and download by the podcast reader. Multiple values are separated by commas. For more on using
tion /enabling Media RSS Feeds to share content via iTunesU, see: Enable Media RSS Feeds
al
Not
e:
Pro webui.feed.podcast.communities
pert
y:
Info This optional property enables Podcast Support on the RSS feed for the specified community handles. The podcast is iTunes compatible and will
rma expose the bitstreams in the items for viewing and download by the podcast reader. Multiple values are separated by commas. For more on using
tion /enabling Media RSS Feeds to share content via iTunesU, see: Enable Media RSS Feeds
al
Not
e:
Pro webui.feed.podcast.mimetypes
pert
y:
Info This optional property for Podcast Support, allows you to choose which MIME types of bitstreams are to be enclosed in the podcast feed. Multiple
rma values are separated by commas. For more on using/enabling Media RSS Feeds to share content via iTunesU, see: Enable Media RSS Feeds
tion
al
Not
e:
Pro webui.feed.podcast.sourceuri
pert
y:
Info This optional property for the Podcast Support will allow you to use a value for a metadata field as a replacement for actual bitstreams to be
rma enclosed in the RSS feed. A use case for specifying the external sourceuri would be if you have a non-DSpace media streaming server that has a
tion copy of your media file that you would prefer to have the media streamed from. For more on using/enabling Media RSS Feeds to share content
al via iTunesU, see: Enable Media RSS Feeds
Not
e:
OpenSearch Support
OpenSearch is a small set of conventions and documents for describing and using "search engines", meaning any service that returns a set of results for a
query. See extensive description in the Business Layer section of the documentation.
Please note that RSS/Atom feeds require that OpenSearch is enabled to function.
OpenSearch uses all the configuration properties for DSpace RSS to determine the mapping of metadata fields to feed fields. Note that a new field for
authors has been added (used in Atom format only).
Property: websvc.opensearch.enable
570
Information Whether or not OpenSearch is enabled. By default, the feature is enabled to support RSS/Atom feeds. Change to "false" to disable.
al Note:
Property: websvc.opensearch.svccontext
Information The URL path where OpenSearch is made available on the backend. For example, "search" means it is available at ${dspace.server.
al Note: url}/search
Property: websvc.opensearch.uicontext
Information Context for HTML request URLs. Change only for non-standard servlet mapping.
al Note:
Property: websvc.opensearch.autolink
Property: websvc.opensearch.validity
Example websvc.opensearch.validity = 48
Value:
Information Number of hours to retain results before recalculating. This applies to the Manakin interface only.
al Note:
Property: websvc.opensearch.shortname
Information A short name used in browsers for search service. It should be sixteen (16) or fewer characters.
al Note:
Property: websvc.opensearch.longname
Property: websvc.opensearch.description
Property: websvc.opensearch.faviconurl
Information Location of favicon for service, if any. They must by 16 x 16 pixels. You can provide your own local favicon instead of the default.
al Note:
Property: websvc.opensearch.samplequery
Information Sample query. This should return results. You can replace the sample query with search terms that should actually yield results in your
al Note: repository.
Property: websvc.opensearch.tags
571
Information Tags used to describe search service.
al Note:
Property: websvc.opensearch.formats
Information Result formats offered. Use one or more comma-separated from the list: html, atom, rss. Please note that html is required for auto
al Note: discovery in browsers to function, and must be the first in the list if present.
Prope webui.content_disposition_threshold
rty:
Infor The default filesize is set to 8MB. When a file/bitstream being viewed is larger than 8MB, the user's browser will download the file to their local
matio machine and the user will have to open it manually. All files smaller than this threshold will be sent "inline" to the user's browser, allowing the
nal browser to decide whether to open it within the browser or download it.
Note:
The value provided is always in bytes. For example: 4 MB = 4194304, 8 MB = 8388608, 16 MB = 16777216
NOTE: This threshold is only applied if the file/bitstream does NOT match the below "webui.content_disposition_format" list.
Prope webui.content_disposition_format
rty:
Infor Set which file mimetypes or file extensions will be forced to download, regardless of the "threshold" set above. Multiple values may be provided
matio by setting this property several times, or by passing it a comma-separated list.
nal
Note: For example, setting this to "text/html, text/richtext" will ensure that all files/bitstreams matching those MIME Types will always be downloaded
(and never open inline in the user's browser).
File extensions may also be used to reference formats. For example, setting "pdf, xls" will ensure all files ending in ".pdf" or ".xls" will always be
downloaded.
Pr webui.html.max-depth-guess
op
ert
y:
Ex webui.html.max-depth-guess = 3
a
m
pl
e
V
al
ue:
Inf When serving up composite HTML items in the UI, how deep can the request be for us to serve up a file with the same name? For example, if one
or receives a request for "foo/bar/index.html" and one has a bitstream called just "index.html", DSpace will serve up the former bitstream (foo/bar
m /index.html) for the request if webui.html.max-depth-guess is 2 or greater. If webui.html.max-depth-guess is 1 or less, then DSpace would not serve
ati that bitstream, as the depth of the file is greater. If webui.html.max-depth-guess is zero, the request filename and path must always exactly match
on the bitstream name. The default is set to 3.
al
N UNSUPPORTED IN DSpace 7.0
ot
e:
572
Sitemap Settings
To aid web crawlers index the content within your repository, you can make use of sitemaps. For best SEO, Sitemaps are enabled by default and update
automatically (see cron setting).
Pro sitemap.dir
per
ty:
Ex sitemap.dir = ${dspace.dir}/sitemaps
am
ple
Val
ue:
Pro sitemap.engineurls
per
ty:
Ex sitemap.engineurls = http://www.google.com/webmasters/sitemaps/ping?sitemap=
am
ple
Val
ue:
Inf Comma-separated list of search engine URLs to "ping" when a new Sitemap has been created. Include everything except the Sitemap UL itself
or (which will be URL-encoded and appended to form the actual URL "pinged").Add the following to the above parameter if you have an application
ma ID with Yahoo: http://search.yahooapis.com/SiteExplorererService/V1/updateNotification?appid=REPLACE_ME?url=_ . (Replace the component
tio _REPLACE_ME with your application ID). There is no known "ping" URL for MSN/Live search.
nal
Not
e:
Pro sitemap.cron
per
ty:
Ex sitemap.cron = 0 15 1 * * ?
am
ple
Val
ue:
Inf The DSpace sitemaps are regenerated on a regular basis based on the Cron syntax provided in this configuration. By default, sitemaps are
or updated daily at 1:15am local time. Cron syntax is defined at https://www.quartz-scheduler.org/api/2.3.0/org/quartz/CronTrigger.html Remove
ma (comment out) this config to disable the sitemap scheduler. Sitemap scheduler can also be disabled by setting to "-" (single dash) in local.cfg.
tio
nal
Not
e:
For an in-depth description of this feature, please consult: Authority Control of Metadata Values
Prop plugin.named.org.dspace.content.authority.ChoiceAuthority
erty:
573
Exam
ple plugin.named.org.dspace.content.authority.ChoiceAuthority = \
Value: org.dspace.content.authority.SampleAuthority = Sample, \
org.dspace.content.authority.SHERPARoMEOPublisher = SRPublisher, \
org.dspace.content.authority.SHERPARoMEOJournalTitle = SRJournalTitle, \
org.dspace.content.authority.SolrAuthority = SolrAuthorAuthority
Prop plugin.selfnamed.org.dspace.content.authority.ChoiceAuthority
erty:
Exam
ple plugin.selfnamed.org.dspace.content.authority.ChoiceAuthority = \
Value: org.dspace.content.authority.DCInputAuthority
Prop lcname.url
erty:
Infor Please refers to the Sherpa/RoMEO Publishers Policy Database Integration section for details about such properties. See Configuring the
matio Sherpa/RoMEO Publishers Policy Database Integration
nal
Note:
Prop orcid.api.url
erty:
Prop authority.minconfidence
erty:
Infor This sets the default lowest confidence level at which a metadata value is included in an authority-controlled browse (and search) index. It is a
matio symbolic keyword, one of the following values (listed in descending order): accepted, uncertain, ambiguous, notfound, failed, rejected, novalue,
nal unset. See org.dspace.content.authority.Choices source for descriptions.
Note:
Prope default.locale
rty:
574
Exam default.locale = en
ple
Value:
Infor The default language for the application is set with this property key. This is a locale according to i18n and might consist of country,
matio country_language or country_language_variant. If no default locale is defined, then the server default locale will be used. The format of a local
nal specifier is described here: http://java.sun.com/j2se/1.4.2/docs/api/java/util/Locale.html
Note:
Changes in dspace.cfg
Property: webui.supported.locales
Informational Note: All the locales that are supported by this instance of DSpace. Comma separated list.
UNSUPPORTED IN DSpace 7.0.
Related Files
If you set webui.supported.locales make sure that all the related additional files for each language are available. LOCALE should correspond to the locale
set in webui.supported.locales, e. g.: for webui.supported.locales = en, de, fr, there should be:
[dspace-source]/dspace/modules/server/src/main/resources/Messages.properties
[dspace-source]/dspace/modules/server/src/main/resources/Messages_en.properties
[dspace-source]/dspace/modules/server/src/main/resources/Messages_de.properties
[dspace-source]/dspace/modules/server/src/main/resources/Messages_fr.properties
Files to be localized:
[dspace-source]/dspace/modules/server/src/main/resources/Messages_LOCALE.properties
[dspace-source]/dspace/config/submission-forms_LOCALE.xml
[dspace-source]/dspace/config/default_LOCALE.license - should be pure ASCII
[dspace-source]/dspace/config/emails/change_password_LOCALE
[dspace-source]/dspace/config/emails/feedback_LOCALE
[dspace-source]/dspace/config/emails/internal_error_LOCALE
[dspace-source]/dspace/config/emails/register_LOCALE
[dspace-source]/dspace/config/emails/submit_archive_LOCALE
[dspace-source]/dspace/config/emails/submit_reject_LOCALE
[dspace-source]/dspace/config/emails/submit_task_LOCALE
[dspace-source]/dspace/config/emails/subscription_LOCALE
[dspace-source]/dspace/config/emails/suggest_LOCALE
Property: upload.temp.dir
Informational Note: This property sets where DSpace temporarily stores uploaded files.
Property: sfx.server.url
sfx.server.url = http://worldcatlibraries.org/registry/gateway?
575
Informational Note: SFX query is appended to this URL. If this property is commented out or omitted, SFX support is switched off.
All the parameters mapping are defined in [dspace]/config/sfx.xml file. The program will check the parameters in sfx.xml and retrieve the correct
metadata of the item. It will then parse the string to your resolver.
For the following example, the program will search the first query-pair which is DOI of the item. If there is a DOI for that item, your retrieval results will be,
for example:
http://researchspace.auckland.ac.nz/handle/2292/5763
<query-pairs>
<field>
<querystring>rft_id=info:doi/</querystring>
<dc-schema>dc</dc-schema>
<dc-element>identifier</dc-element>
<dc-qualifier>doi</dc-qualifier>
</field>
</query-pairs>
If there is no DOI for that item, it will search next query-pair based on the [dspace]/config/sfx.xml and then so on.
<querystring>rft_id=info:doi/</querystring>
Program assume won't get empty string for the item, as there will at least author, title for the item to pass to the resolver.
For contributor author, program maintains original DSpace SFX function of extracting author's first and last name.
<field>
<querystring>rft.aulast=</querystring>
<dc-schema>dc</dc-schema>
<dc-element>contributor</dc-element>
<dc-qualifier>author</dc-qualifier>
</field>
<field>
<querystring>rft.aufirst=</querystring>
<dc-schema>dc</dc-schema>
<dc-element>contributor</dc-element>
<dc-qualifier>author</dc-qualifier>
</field>
The need for a limited set of keywords is important since it eliminates the ambiguity of a free description system, consequently simplifying the task of
finding specific items of information.
The controlled vocabulary add-on allows the user to choose from a defined set of keywords organized in an tree (taxonomy) and then use these keywords
to describe items while they are being submitted.
We have also developed a small search engine that displays the classification tree (or taxonomy) allowing the user to select the branches that best
describe the information that he/she seeks.
The taxonomies are described in XML following this (very simple) structure:
576
<node id="acmccs98" label="ACMCCS98">
<isComposedBy>
<node id="A." label="General Literature">
<isComposedBy>
<node id="A.0" label="GENERAL"/>
<node id="A.1" label="INTRODUCTORY AND SURVEY"/>
</isComposedBy>
</node>
</isComposedBy>
</node>
You are free to use any application you want to create your controlled vocabularies. A simple text editor should be enough for small projects. Bigger
projects will require more complex tools. You may use Protegé to create your taxonomies, save them as OWL and then use a XML Stylesheet (XSLT) to
transform your documents to the appropriate format. Future enhancements to this add-on should make it compatible with standard schemas such as OWL
or RDF.
New vocabularies should be placed in [dspace]/config/controlled-vocabularies/ and must be according to the structure described. A
validation XML Schema (named controlledvocabulary.xsd) is also available in that directory.
Vocabularies need to be associated with the correspondent DC metadata fields. Edit the file [dspace]/config/input-forms.xml and place a "vocabu
lary" tag under the "field" element that you want to control. Set value of the "vocabulary" element to the name of the file that contains the vocabulary,
leaving out the extension (the add-on will only load files with extension "*.xml"). For example:
<field>
<dc-schema>dc</dc-schema>
<dc-element>subject</dc-element>
<dc-qualifier></dc-qualifier>
<!-- An input-type of twobox MUST be marked as repeatable -->
<repeatable>true</repeatable>
<label>Subject Keywords</label>
<input-type>twobox</input-type>
<hint> Enter appropriate subject keywords or phrases below. </hint>
<required></required>
<vocabulary [closed="false"]>nsi</vocabulary>
</field>
The vocabulary element has an optional boolean attribute closed that can be used to force input only with the JavaScript of controlled-vocabulary add-on.
The default behavior (i.e. without this attribute) is as set closed="false". This allow the user also to enter the value in free way.
In order to change the registries, you may adjust the XML files before the first installation of DSpace. On an already running instance it is recommended to
change bitstream registries via DSpace admin UI, but the metadata registries can be loaded again at any time from the XML files without difficulty. The
changes made via admin UI are not reflected in the XML files.
There is a set of Dublin Core Elements, which is used by the system and should not be removed or moved to another schema. See Appendix: Default
Dublin Core Metadata registry.
Note: altering a Metadata Registry has no effect on corresponding parts, e.g. item submission interface, item display, item import and vice versa. Every
metadata element used in submission interface or item import must be registered before using it.
577
Note also that deleting a metadata element will delete all its corresponding values.
If you wish to add more metadata elements, you can do this in one of two ways. Via the DSpace admin UI you may define new metadata elements in the
different available schemas. But you may also modify the XML file (or provide an additional one), and re-import the data as follows:
<dspace-dc-types>
<dc-type>
<schema>dc</schema>
<element>contributor</element>
<qualifier>advisor</qualifier>
<scope_note>Use primarily for thesis advisor.</scope_note>
</dc-type>
</dspace-dc-types>
The set of metadata registry files which is read by the MetadataImporter tool is configured by the metadata.registry.load property in dspace.cfg
or local.cfg. If you wish to use the importer to load a new metadata namespace from a new file, you will need to add the path to your new registry file
as an additional value of this property before running the tool.
Unknown
License
Deleting a format will cause any existing bitstreams of this format to be reverted to the unknown bitstream format.
More than one such plugin may be configured – each will receive all usage events.
If you wish to write your own, it must extend the abstract class org.dspace.usage.AbstractUsageEventListener.
The property workflow.reviewer.file-edit controls whether files may be added/edited/removed during review (set to true) or whether files can be
downloaded during review only.
[dspace]/config/modules/workflow.cfg
578
The workflow system will send notifications on new Items waiting to be reviewed to all EPersons that may resolve those. Tasks can be taken to avoid that
two EPersons work on the same task at the same time without knowing from each other. When a EPerson returns a task to the pool without resolving it (by
accepting or rejecting the submission), another E-Mail is sent. In case you only want to be notified of completely new tasks entering a step of the workflow
system, you may switch off notifications on tasks returned to the pool by setting workflow.notify.returend.tasks to false in config/modules/workflow.cfg as
shown below:
[dspace]/config/modules/workflow.cfg
In the spiders directory itself, you will find a number of files provided by iplists.com. These files contain network address patterns which have been
discovered to identify a number of known indexing services and other spiders. You can add your own files here if you wish to exclude more addresses that
you know of. You will need to include your files' names in the list configured in config/modules/solr-statistics.cfg. The iplists.com-*.txt
files can be updated using a tool provided by DSpace. See SOLR Statistics for details.
In the spiders directory you will also find two subdirectories. agents contains files filled with regular expressions, one per line. An incoming request's Us
er-Agent header is tested with each expression found in any of these files until an expression matches. If there is a match, the request is marked as
being from a spider, otherwise not. domains similarly contains files filled with regular expressions which are used to test the domain name from which the
request comes. You may add your own files of regular expressions to either directory if you wish to test requests with patterns of your own devising.
-- name the name of the desired configuration property. This option is required.
property
-p
-- name the name of the module in which the property is found. If omitted, the value of --property is the entire name. If used, the name
module will be composed as module.property. For example, "-m dspace -p url" will look up the value of dspace.url.
-m
--raw if used, this prevents the substitution of other property values into the value of the requested property.
-r It is also useful to see all of the propery values when a specific property has an array of values (i.e. the configuration supports
specifying multiple values). Otherwise, by default , dsprop may only return the first value in the array.
-h
-?
579
DSpace Item State Definitions
Workspace item
An item that is under submission and active edit by an authorized user. The workspace item is visible only to the submitter and the system
administrators. (Currently there is no simple way to find/browse such items other than with the direct item ID or to use the supervisor functionality). Using
the supervisor functionality, a system admin can allow other authorized user to see/edit the item in the workspace state.
Self deposit
Collaboration over an in-progress submission for a small group of researchers. (This use case is implemented only with major limitations, using
the supervision feature – concurrency, lack of delegation: supervision must be defined by the system administrators, etc.)
Workflow Item
An item that is under review for quality control and policy compliance. The workflow item is visible to the original submitter (currently only basic metadata
are visible out-of-box in the mydspace summary list), users assigned to the specific workflow step where the item resides, and system
administrators. (Currently there is no simple way to find/browse such items other than with the direct item ID or to use the abort workflow functionality).
Quality control
Improvements to the bibliographic record (metadata available in workflow can be different than those asked of the submitter)
Check of policy / copyright
Withdrawn item
It is the removal of an Item from the archive. However, a withdrawn item is still available to Administrative users (and may optionally be restored to the
archive at a later date). A withdrawn item disappears from DSpace (except from Administrative screens) and the item appears to be deleted.
Staging area for item to be removed when copyright issues arise with publisher. If the copyright issue is confirmed, the item will be permanently
deleted or kept in the withdrawn state for future reference.
Logical deletion delegated to community/collection admin, where permanent deletion is reserved to system administrators
Logical deletion, where permanent deletion is not an option for an organization
Removal of an old version of an item, forcing redirect to a new up-to-date version of the item (this use case is not currently implemented out-of-
box in DSpace)
By design, withdrawing an item is reversible. As an administrator, you can reverse the withdrawing of an item, through the action "reinstate". As a
mechanism to support this, a resource policy state "WITHDRAWN_READ" was introduced.
When as item is withdrawn, all READ policies associated with the item and its underlying bundles and bitstreams, are changed into WITHDRAWN_READ
policies. This achieves 2 things:
1. The READ policy information in itself is still preserved, and can get switched back to normal READ policies if the item gets reinstated.
2. As long as the item is withdrawn, those WITHDRAWN_READ policies should not give any users or groups read rights.
WITHDRAWN_READ was introduced in DS-3097 after it was observed that even though an item was withdrawn, the related bitstreams were still
accessible.
A non-discoverable (or "private") item is one that is simply hidden from all search/browse/OAI results, and is therefore only accessible via direct link (or
bookmark). By default, all Items are discoverable, meaning they will appear in search/browse/OAI results.
It's important to clarify that non-discoverable items may or may not be access restricted. It is possible for an Item to be anonymously visible, but non-
discoverable, so that you can only access the item if you are given a link to it.
This state should only refer to the discoverable nature of the item. A non-discoverable (or "private") item will not be included in any system that aims to
help users to find items. So it will not appear in:
Browse
Recent submission
Search result
OAI-PMH (at least for the ListRecords and ListIdentifiers verb; though the OAI-PMH specification is not clear about inconsistent implementation of
the ListRecords and GetRecord verb)
REST list and search methods
It should be accessible under the actual Authorization Policies of DSpace using direct URL or query method such as:
580
OAI-PMH GetRecord verb
REST direct access /rest/item/<item-id> or equivalent
Provide a light rights awareness feature where discovery is not enabled for search and/or browse
Hide “special items” such as repository presentations, guides or support materials
Hide an old version of an Item in cases where real versioning is not appropriate or liked
Hide specific types of item such as “Item used to record Journal record: Journal Title, ISSN, Publisher etc.” used as authority file for metadata (dc.
relation.ispartof) of “normal item”
Archived/Published item
An item that is in a stable state, available in the repository under the defined Authorization Policies. Changes to these items are possible only for a
restricted group of users (administrators) and should produce versioning according to the Institution's policy.
Embargoed Item
Are a special case of Archived/Published Item. The item has some time based access policy attached to it and/or the underlying bitstreams. Specifically,
read permission for someone (EPerson Group) starting from a defined date. Typically embargo is applied to the bitstreams so that "fulltext" has initially
very limited access (normally administrators or other "repository staff" groups) and only after a defined date will the fulltext become visible to all users
(Anonymous group). This scenario is used to implement typical "embargo requirements" from publishers -- see Delayed Open Access.
If the metadata of the item should be visible only to a specific group of users, it is possible to define an embargo policy also for the ITEM itself. A READ
policy for a specific group will mean that only the users in that group will be able to access the item splash page. Note that the DSpace REST API & UI
is fully rights aware (see Discovery documentation for more information, especially the section on "Access Rights Awareness"), meaning that an
embargoed item is hidden automatically until the embargo expires.
581
Directories and Files
1 Overview
2 Source Directory Layout
3 Installed Directory Layout
4 Contents of Server Web Application
5 Log Files
5.1 log4j2.xml File.
Overview
A complete DSpace installation consists of three separate directory trees:
The source directory:: This is where (surprise!) the source code lives. Note that the config files here are used only during the initial install
process. After the install, config files should be changed in the install directory. It is referred to in this document as [dspace-source].
The install directory:: This directory is populated during the install process and also by DSpace as it runs. It contains config files, command-line
tools (and the libraries necessary to run them), and usually -- although not necessarily -- the contents of the DSpace archive (depending on how
DSpace is configured). After the initial build and install, changes to config files should be made in this directory. It is referred to in this document
as [dspace].
The web deployment directory:: This directory is generated by the web server the first time it finds a dspace.war file in its webapps directory. It
contains the unpacked contents of dspace.war, i.e. the JSPs and java classes and libraries necessary to run DSpace. Files in this directory
should never be edited directly; if you wish to modify your DSpace installation, you should edit files in the source directory and then rebuild. The
contents of this directory aren't listed here since its creation is completely automatic. It is usually referred to in this document as [tomcat]/webapps
/dspace.
[dspace]
582
assetstore/ - assetstore files. This is where all the files uploaded into DSpace are stored by default. See Storage Layer.
bin/ - shell scripts for DSpace command-line tasks. Primary among them is the 'dspace' commandline utility
config/ - configuration, with sub-directories as above
etc/ - Administrative and database management files
exports/ - temporary storage for any export packages
handle-server/ - Handles server files and configuration
imports/ - temporary storage for any import packages
lib/ - JARs, including dspace-api.jar, containing the DSpace classes
log/ - Log files
reports/ - Reports generated by statistical report generator
solr/ - Solr search/browse indexes
triplestore/ - RDF triple store index files (when enabled)
upload/ - temporary directory used during file uploads etc.
webapps/ - location where DSpace installs all Web Applications
server/
index.html - Root page of the third party HAL Browser (used to browse/search REST API)
login.html - (Custom) Login page for HAL Browser (supporting DSpace authentication plugins)
js/ - Javascript overrides for HAL Browser (main HAL Browser code is brought in via Spring REST dependencies)
Log Files
The first source of potential confusion is the log files. Since DSpace uses a number of third-party tools, problems can occur in a variety of places. Below is
a table listing the main log files used in a typical DSpace setup. The locations given are defaults, and might be different for your system depending on
where you installed DSpace and the third-party tools. The ordering of the list is roughly the recommended order for searching them for the details about a
particular problem or error.
[dspace] Main DSpace log file. This is where the DSpace code writes a simple log of events and errors that occur within the DSpace code. You can
/log control the verbosity of this by editing the [dspace-source]/config/templates/log4j.properties file and then running "ant init_configs".
/dspace.
log.yyyy-
mm-dd
[dspace] The Handle server runs as a separate process from the DSpace Web UI (which runs under Tomcat's JVM). Due to a limitation of log4j's
/log 'rolling file appenders', the DSpace code running in the Handle server's JVM must use a separate log file. The DSpace code that is run as
/handle- part of a Handle resolution request writes log information to this file. You can control the verbosity of this by editing [dspace-source]/config
plugin.log /templates/log4j-handle-plugin.properties.
[dspace] This is the log file for CNRI's Handle server code. If a problem occurs within the Handle server code, before DSpace's plug-in is invoked,
/log this is where it may be logged.
/handle-
server.log
[tomcat] This is where Tomcat's standard output is written. Many errors that occur within the Tomcat code are logged here. For example, if Tomcat
/logs can't find the DSpace code (dspace.jar), it would be logged in catalina.out.
/catalina.
out
[tomcat] If you're running Tomcat stand-alone (without Apache), it logs some information and errors for specific Web applications to this log file. hos
/logs tname will be your host name (e.g. dspace.myu.edu) and yyyy-mm-dd will be the date.
/hostname
_log.yyyy-
mm-dd.txt
[tomcat] If you're using Apache, Tomcat logs information about Web applications running through Apache (mod_webapp) in this log file (yyyy-mm-
/logs dd being the date.)
/apache_lo
g.yyyy-mm-
dd.txt
[apache] Apache logs to this file. If there is a problem with getting mod_webapp working, this is a good place to look for clues. Apache also writes
/error_log to several other log files, though error_log tends to contain the most useful information for tracking down problems.
PostgreSQ PostgreSQL also writes a log file. This one doesn't seem to have a default location, you probably had to specify it yourself at some point
L log during installation. In general, this log file rarely contains pertinent information--PostgreSQL is pretty stable, you're more likely to
encounter problems with connecting via JDBC, and these problems will be logged in dspace.log.
log4j2.xml File.
583
the file [dspace]/config/log4j2.xml controls how and where log files are created. There are three sets of configurations in that file, called A1, and A2. These
are used to control the logs for DSpace (as a whole), and the checksum checker respectively. As implied by the name, this configuration use Log4j v2. For
more information on syntax, see https://logging.apache.org/log4j/2.x/manual/configuration.html
584
Metadata and Bitstream Format Registries
1 Default Dublin Core Metadata Registry (DC)
2 Dublin Core Terms Registry (DCTERMS)
3 Local Metadata Registry (local)
4 Default Bitstream Format Registry
contributor A person, organization, or service responsible for the content of the resource. Catch-all for unspecified contributors.
contributor editor
contributor illustrator
contributor other
date available¹ Date or date range item became available to the public.
date created Date of creation or manufacture of intellectual content if different from date.issued.
identifier Catch-all for unambiguous identifiers not defined by qualified form; use identifier.other for a known identifier common to
a local collection instead of unqualified form.
identifier citation² Human-readable, standard bibliographic citation of non-DSpace format of this item
description provenance¹ The history of custody of the item since its creation, including any changes successive custodians made to it.
description sponsorship² Information about sponsoring agencies, individuals, or contractual arrangements for the item.
585
description statementofrespo To preserve statement of responsibility from MARC records.
nsibility
language Catch-all for non-ISO forms of the language of the item, accommodating harvested values.
language iso² Current ISO standard for language of intellectual content, including country codes (e.g. "en_US").
relation¹ ispartofseries Series name and number within that series, if available.
relation requires Referenced resource is required to support function, delivery, or coherence of item.
subject classification Catch-all for value from local classification system. Global classification systems will receive specific qualifier
subject other Local controlled vocabulary; global vocabularies will receive specific qualifier.
title alternative² Varying (or substitute) form of title proper appearing in item, e.g. abbreviation or translation
¹ Used by several functional areas of DSpace. DO NOT REMOVE WITHOUT INVESTIGATING THE CONSEQUENCES
² This field is included in the default DSpace Submission User Interface. Removing this field from your registry will break the default DSpace submission
form.
586
Dublin Core Terms Registry (DCTERMS)
The Dublin Core Terms (DCTERMS) registry was introduced in DSpace 4. This registry initializes an optional metadata schema, where dcterms is used to
identify the namespace. In DSpace 4, none of these fields are used by any of the system functionality out of the box. The registry and schema were added
as a first step to facilitate the future migration of the DSpace specific DC schema, to this schema that complies to current Dublin Core standards.
The main advantage of the DCTERMS schema is that no field name details gets lost during harvesting, as opposed to harvesting of so called "simple"
dublin core, where the qualifiers from the above schema are omitted during harvesting.
As this registry is meant to track the Dublin Core Terms standard, it's recommended that the local DSpace administrator not add/remove metadata fields
from this namespace; the "local" namespace should be used instead (see below).
accessRights Information about who can access the resource or an indication of its security status. May include information regarding access or
restrictions based on privacy, security, or other policies.
available Date (often a range) that the resource became or will become available.
bibliographic Recommended practice is to include sufficient bibliographic detail to identify the resource as unambiguously as possible.
Citation
contributor An entity responsible for making contributions to the resource. Examples of a Contributor include a person, an organization, or a
service.
coverage The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is
relevant.
date A point or period of time associated with an event in the lifecycle of the resource.
educationLev A class of entity, defined in terms of progression through an educational or training context, for which the described resource is
el intended.
hasFormat A related resource that is substantially the same as the pre-existing described resource, but in another format.
hasPart A related resource that is included either physically or logically in the described resource.
hasVersion A related resource that is a version, edition, or adaptation of the described resource.
instructionalM A process, used to engender knowledge, attitudes and skills, that the described resource is designed to support.
ethod
isFormatOf A related resource that is substantially the same as the described resource, but in another format.
isPartOf A related resource in which the described resource is physically or logically included.
587
isReferenced A related resource that references, cites, or otherwise points to the described resource.
By
isReplacedBy A related resource that supplants, displaces, or supersedes the described resource.
isRequiredBy A related resource that requires the described resource to support its function, delivery, or coherence.
isVersionOf A related resource of which the described resource is a version, edition, or adaptation.
license A legal document giving official permission to do something with the resource.
mediator An entity that mediates access to the resource and for whom the resource is intended or useful.
provenance A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity,
and interpretation.
references A related resource that is referenced, cited, or otherwise pointed to by the described resource.
replaces A related resource that is supplanted, displaced, or superseded by the described resource.
requires A related resource that is required by the described resource to support its function, delivery, or coherence.
588
application/mathematica Mathematica Mathematica Notebook Known false ma
application/pdf Adobe PDF Adobe Portable Document Format Known false pdf
application/sgml SGML SGML application (RFC 1874) Known false sgm, sgml
application/vnd.ms-project Microsoft Project Microsoft Project Known false mpd, mpp, mpx
audio/x-aiff AIFF Audio Interchange File Format Known false aif, aifc, aiff
image/jpeg JPEG Joint Photographic Experts Group/JPEG File Interchange Format Known false jpeg, jpg
(JFIF)
image/tiff TIFF Tag Image File Format Known false tif, tiff
video/mpeg MPEG Moving Picture Experts Group Known false mpe, mpeg,
mpg
¹ Used by several functional areas of DSpace. DO NOT REMOVE WITHOUT INVESTIGATING THE CONSEQUENCES
589
Architecture
1 Overview
Overview
The DSpace system is organized into three layers, each of which consists of a number of components.
Application Layer - All external/public facing interfaces/tools. These include the Web User Interface, REST API, OAI-PMH, RDF, and SWORD (v1
and v2) interfaces. Also includes the Command Line interface, and various tools that can be used to import/export data to/from DSpace.
Business Logic Layer - Primarily the Java API layer ([dspace-source]/dspace-api and dspace-services), which provides the core business logic
for all the various application interfaces.
Storage Layer - A subset of the dspace-api (org.dspace.storage.* classes) whose role is to manage all content storage (metadata, relationships,
bitstreams) for all business layer objects. This layer is provides access to a relational database (Postgres or Oracle usually) via Hibernate ORM &
using FlywayDB for migrations/updates. It also defines a custom BitStoreService for storing files (bitstreams) via storage plugins (currently
supporting filesystem storage or Amazon S3 storage).
Each layer only invokes the layer below it; the application layer may not use the storage layer directly, for example. Each component in the storage and
business logic layers has a defined public API. The union of the APIs of those components are referred to as the Storage API (in the case of the storage
layer) and the DSpace Java API (in the case of the business logic layer), and the DSpace REST API (in the case of the application layer). In the
Application Layer, it's worth noting that the Web User Interface only accesses DSpace via the REST API.
It is important to note that each layer is trusted. Although the logic for authorising actions is in the business logic layer, the system relies on individual
applications in the application layer to correctly and securely authenticate e-people. If a 'hostile' or insecure application were allowed to invoke the Java
API directly, it could very easily perform actions as any e-person in the system.
The reason for this design choice is that authentication methods will vary widely between different applications, so it makes sense to leave the logic and
responsibility for that in these applications.
The source code is organized to cohere very strictly to this three-layer architecture.
The storage and business logic layer APIs are extensively documented with Javadoc-style comments. Generate the HTML version of these by entering the
[dspace-source]/dspace directory and running:
mvn javadoc:javadoc
The resulting documentation will be at [dspace-source]dspace-api/target/site/apidocs/index.html. The package-level documentation of each package
usually contains an overview of the package and some example usage. This information is not repeated in this architecture document; this and the
Javadoc APIs are intended to be used in parallel.
The REST API provides not only JavaDocs, but also a public contract. See REST API.
Storage Layer
RDBMS
Bitstream Store
Business Logic Layer
Core Classes
Content Management API
Workflow System
Administration Toolkit
E-person/Group Manager
Authorisation
Handle Manager/Handle Plugin
Search
Browse API
History Recorder
Checksum Checker
Application Layer
Web User Interface
OAI-PMH Data Provider
Item Importer and Exporter
Transferring Items Between DSpace Instances
Registration
METS Tools
Media Filters
590
Sub-Community Management
591
Application Layer
The following explains the components of the Application Layer.
Web UI Files
The web User Interface code is managed in a separate GitHub Project:
https://github.com/DSpace/dspace-angular/
Quick setup and configuration instructions can be found in the README of that project.
REST API
This component defines the main public API of the Application Layer. See REST API section of the documentation.
592
593
Business Logic Layer
1 Core Classes
1.1 The Configuration Service
1.2 Constants
1.3 Context
1.4 Email
1.5 LogManager
1.6 Utils
2 Content Management API
2.1 Other Classes
2.2 Modifications
2.3 What's In Memory?
2.4 Dublin Core Metadata
2.5 Support for Other Metadata Schemas
2.6 Packager Plugins
3 Plugin Service
3.1 Concepts
3.2 Using the Plugin Service
3.2.1 Types of Plugin
3.2.2 Self-Named Plugins
3.2.3 Obtaining a Plugin Instance
3.2.4 Lifecycle Management
3.2.5 Getting Meta-Information
3.3 Implementation
3.3.1 LegacyPluginServiceImpl Class
3.3.2 SelfNamedPlugin Class
3.3.3 Errors and Exceptions
3.4 Configuring Plugins
3.4.1 Configuring Singleton (Single) Plugins
3.4.2 Configuring Sequence of Plugins
3.4.3 Configuring Named Plugins
3.5 Use Cases
3.5.1 Managing the MediaFilter plugins transparently
3.5.2 A Singleton Plugin
3.5.3 Plugin that Names Itself
3.5.4 Stackable Authentication
4 Workflow System
5 Administration Toolkit
6 E-person/Group Manager
7 Authorization
7.1 Special Groups
7.2 Miscellaneous Authorization Notes
8 Handle Manager/Handle Plugin
9 Search
9.1 Harvesting API
10 Browse API
10.1 Using the API
11 Checksum checker
12 OpenSearch Support
13 Embargo Support
13.1 What is an Embargo?
13.2 Embargo Model and Life-Cycle
Core Classes
The org.dspace.core package provides some basic classes that are used throughout the DSpace code.
The system is configured by editing the relevant files in [dspace]/config, as described in the configuration section.
When editing configuration files for applications that DSpace uses, such as Apache Tomcat, you may want to edit the copy in [dspace-
source] and then run ant update or ant overwrite_configs rather than editing the 'live' version directly! This will ensure you have a
backup copy of your modified configuration files, so that they are not accidentally overwritten in the future.
[dspace]/bin/dspace dsprop property.name This writes the value of property.name from dspace.cfg to the standard output, so that
shell scripts can access the DSpace configuration. If the property has no value, nothing is written.
594
Constants
This class contains constants that are used to represent types of object and actions in the database. For example, authorization policies can relate to
objects of different types, so the resourcepolicy table has columns resource_id, which is the internal ID of the object, and resource_type_id, which
indicates whether the object is an item, collection, bitstream etc. The value of resource_type_id is taken from the Constants class, for example Constants.
ITEM.
Here are a some of the most commonly used constants you might come across:
DSpace types
Bitstream: 0
Bundle: 1
Item: 2
Collection: 3
Community: 4
Site: 5
Group: 6
Eperson: 7
DSpace actions
Read: 0
Write: 1
Delete: 2
Add: 3
Remove: 4
Context
The Context class is central to the DSpace operation. Any code that wishes to use the any API in the business logic layer must first create itself a Context
object. This is akin to opening a connection to a database (which is in fact one of the things that happens.)
A context object is involved in most method calls and object constructors, so that the method or object has access to information about the current
operation. When the context object is constructed, the following information is automatically initialized:
A connection to the database. This is a transaction-safe connection. i.e. the 'auto-commit' flag is set to false.
A cache of content management API objects. Each time a content object is created (for example Item or Bitstream) it is stored in the Context
object. If the object is then requested again, the cached copy is used. Apart from reducing database use, this addresses the problem of having
two copies of the same object in memory in different states.
The following information is also held in a context object, though it is the responsibility of the application creating the context object to fill it out
correctly:
You should always abort a context if any error happens during its lifespan; otherwise the data in the system may be left in an inconsistent state. You can
also commit a context, which means that any changes are written to the database, and the context is kept active for further use.
Email
Sending e-mails is pretty easy. Just use the configuration manager's getEmail method, set the arguments and recipients, and send.
The e-mail texts are stored in [dspace]/config/emails. They are processed by the standard java.text.MessageFormat. At the top of each e-mail are
listed the appropriate arguments that should be filled out by the sender. Example usage is shown in the org.dspace.core.Email Javadoc API
documentation.
LogManager
The log manager consists of a method that creates a standard log header, and returns it as a string suitable for logging. Note that this class does not
actually write anything to the logs; the log header returned should be logged directly by the sender using an appropriate Log4J call, so that information
about where the logging is taking place is also stored.
The level of logging can be configured on a per-package or per-class basis by editing [dspace]/config/log4j.properties. You will need to stop
and restart Tomcat for the changes to take effect.
595
A typical log entry looks like this:
Action view_item
The above format allows the logs to be easily parsed and analyzed. The [dspace]/bin/log-reporter script is a simple tool for analyzing logs. Try:
[dspace]/bin/log-reporter --help
It's a good idea to 'nice' this log reporter to avoid an impact on server performance.
Utils
Utils contains miscellaneous utility method that are required in a variety of places throughout the code, and thus have no particular 'home' in a subsystem.
Classes corresponding to the main elements in the DSpace data model (Community, Collection, Item, Bundle and Bitstream) are sub-classes of the
abstract class DSpaceObject. The Item object handles the Dublin Core metadata record.
Each class generally has one or more static find methods, which are used to instantiate content objects. Constructors do not have public access and are
just used internally. The reasons for this are:
"Constructing" an object may be misconstrued as the action of creating an object in the DSpace system, for example one might expect something
like:
to construct a brand new item in the system, rather than simply instantiating an in-memory instance of an object in the system.
find methods may often be called with invalid IDs, and return null in such a case. A constructor would have to throw an exception in this case. A nu
ll return value from a static method can in general be dealt with more simply in code.
If an instantiation representing the same underlying archival entity already exists, the find method can simply return that same instantiation to
avoid multiple copies and any inconsistencies which might result.
Collection, Bundle and Bitstream do not have create methods; rather, one has to create an object using the relevant method on the container. For
example, to create a collection, one must invoke createCollection on the community that the collection is to appear in:
596
The primary reason for this is for determining authorization. In order to know whether an e-person may create an object, the system must know which
container the object is to be added to. It makes no sense to create a collection outside of a community, and the authorization system does not have a
policy for that.
Items are first created in the form of an implementation of InProgressSubmission. An InProgressSubmission represents an item under construction; once it
is complete, it is installed into the main archive and added to the relevant collection by the InstallItem class. The org.dspace.content package provides an
implementation of InProgressSubmission called WorkspaceItem; this is a simple implementation that contains some fields used by the Web submission UI.
The org.dspace.workflow also contains an implementation called WorkflowItem which represents a submission undergoing a workflow process.
In the previous chapter there is an overview of the item ingest process which should clarify the previous paragraph. Also see the section on the workflow
system.
Community and BitstreamFormat do have static create methods; one must be a site administrator to have authorization to invoke these.
Other Classes
Classes whose name begins DC are for manipulating Dublin Core metadata, as explained below.
The FormatIdentifier class attempts to guess the bitstream format of a particular bitstream. Presently, it does this simply by looking at any file extension in
the bitstream name and matching it up with the file extensions associated with bitstream formats. Hopefully this can be greatly improved in the future!
The ItemIterator class allows items to be retrieved from storage one at a time, and is returned by methods that may return a large number of items, more
than would be desirable to have in memory at once.
The ItemComparator class is an implementation of the standard java.util.Comparator that can be used to compare and order items based on a particular
Dublin Core metadata field.
Modifications
When creating, modifying or for whatever reason removing data with the content management API, it is important to know when changes happen in-
memory, and when they occur in the physical DSpace storage.
Primarily, one should note that no change made using a particular org.dspace.core.Context object will actually be made in the underlying storage unless co
mplete or commit is invoked on that Context. If anything should go wrong during an operation, the context should always be aborted by invoking abort, to
ensure that no inconsistent state is written to the storage.
Additionally, some changes made to objects only happen in-memory. In these cases, invoking the update method lines up the in-memory changes to occur
in storage when the Context is committed or completed. In general, methods that change any metadata field only make the change in-memory; methods
that involve relationships with other objects in the system line up the changes to be committed with the context. See individual methods in the API Javadoc.
The new name will not be stored since update was not invoked
Context context = new Context();
Bitstream b = Bitstream.find(context, 1234);
b.setName("newfile.txt");
context.complete();
597
The bitstream will be included in the bundle, since update doesn't need to be called
Context context = new Context();
Bitstream bs = Bitstream.find(context, 1234);
Bundle bnd = Bundle.find(context, 5678);
bnd.add(bs);
context.complete();
What's In Memory?
Instantiating some content objects also causes other content objects to be loaded into memory.
Instantiating a Bitstream object causes the appropriate BitstreamFormat object to be instantiated. Of course the Bitstream object does not load the
underlying bits from the bitstream store into memory!
Instantiating a Bundle object causes the appropriate Bitstream objects (and hence BitstreamFormats) to be instantiated.
Instantiating an Item object causes the appropriate Bundle objects (etc.) and hence BitstreamFormats to be instantiated. All the Dublin Core metadata
associated with that item are also loaded into memory.
The reasoning behind this is that for the vast majority of cases, anyone instantiating an item object is going to need information about the bundles and
bitstreams within it, and this methodology allows that to be done in the most efficient way and is simple for the caller. For example, in the Web UI, the
servlet (controller) needs to pass information about an item to the viewer (JSP), which needs to have all the information in-memory to display the item
without further accesses to the database which may cause errors mid-display.
You do not need to worry about multiple in-memory instantiations of the same object, or any inconsistencies that may result; the Context object keeps a
cache of the instantiated objects. The find methods of classes in org.dspace.content will use a cached object if one exists.
It may be that in enough cases this automatic instantiation of contained objects reduces performance in situations where it is important; if this proves to be
true the API may be changed in the future to include a loadContents method or somesuch, or perhaps a Boolean parameter indicating what to do will be
added to the find methods.
When a Context object is completed, aborted or garbage-collected, any objects instantiated using that context are invalidated and should not be used (in
much the same way an AWT button is invalid if the window containing it is destroyed).
Below is the specific syntax that DSpace expects various fields to adhere to:
date Any or ISO 8601 in the UTC time zone, with either year, month, day, or second precision. Examples:_2000 2002-10 2002-08-14 1999-01-01T14: DCDate
unqualifi 35:23Z _
ed
contr Any or In general last name, then a comma, then first names, then any additional information like "Jr.". If the contributor is an organization, then DCPers
ibutor unqualifi simply the name. Examples:_Doe, John Smith, John Jr. van Dyke, Dick Massachusetts Institute of Technology _ onName
ed
lang iso A two letter code taken ISO 639, followed optionally by a two letter country code taken from ISO 3166. Examples:_en fr en_US _ DCLang
uage uage
relati ispartofs The series name, following by a semicolon followed by the number in that series. Alternatively, just free text._MIT-TR; 1234 My Report DCSerie
on eries Series; ABC-1234 NS1234 _ sNumber
The MetadataField class describes a metadata field by schema, element and optional qualifier. The value of a MetadataField is described by a MetadataVa
lue which is roughly equivalent to the older Metadatum class. Finally the MetadataSchema class is used to describe supported schemas. The DC schema
is supported by default. Refer to the javadoc for method details.
Packager Plugins
598
The Packager plugins let you ingest a package to create a new DSpace Object, and disseminate a content Object as a package. A package is simply a
data stream; its contents are defined by the packager plugin's implementation.
To ingest an object, which is currently only implemented for Items, the sequence of operations is:
Plugin Service
In DSpace 6, the old "PluginManager" was replaced by org.dspace.core.service.PluginService which performs the same activities/actions.
The PluginService is a very simple component container. It creates and organizes components (plugins), and helps select a plugin in the cases where
there are many possible choices. It also gives some limited control over the life cycle of a plugin.
Concepts
The following terms are important in understanding the rest of this section:
Plugin Interface A Java interface, the defining characteristic of a plugin. The consumer of a plugin asks for its plugin by interface.
Plugin a.k.a. Component, this is an instance of a class that implements a certain interface. It is interchangeable with other implementations, so
that any of them may be "plugged in", hence the name. A Plugin is an instance of any class that implements the plugin interface.
Implementation class The actual class of a plugin. It may implement several plugin interfaces, but must implement at least one.
Name Plugin implementations can be distinguished from each other by name, a short String meant to symbolically represent the implementation
class. They are called "named plugins". Plugins only need to be named when the caller has to make an active choice between them.
SelfNamedPlugin class Plugins that extend the SelfNamedPlugin class can take advantage of additional features of the Plugin Manager. Any
class can be managed as a plugin, so it is not necessary, just possible.
Reusable Reusable plugins are only instantiated once, and the Plugin Manager returns the same (cached) instance whenever that same plugin is
requested again. This behavior can be turned off if desired.
Types of Plugin
The Plugin Service supports three different patterns of usage:
1. Singleton Plugins There is only one implementation class for the plugin. It is indicated in the configuration. This type of plugin chooses an
implementation of a service, for the entire system, at configuration time. Your application just fetches the plugin for that interface and gets the
configured-in choice. See the getSinglePlugin() method.
2. Sequence Plugins You need a sequence or series of plugins, to implement a mechanism like Stackable Authentication or a pipeline, where each
plugin is called in order to contribute its implementation of a process to the whole. The Plugin Manager supports this by letting you configure a
sequence of plugins for a given interface. See the getPluginSequence() method.
3. Named Plugins Use a named plugin when the application has to choose one plugin implementation out of many available ones. Each
implementation is bound to one or more names (symbolic identifiers) in the configuration. The name is just a string to be associated with the
599
3.
combination of implementation class and interface. It may contain any characters except for comma (,) and equals (=). It may contain embedded
spaces. Comma is a special character used to separate names in the configuration entry. Names must be unique within an interface: No plugin
classes implementing the same interface may have the same name. Think of plugin names as a controlled vocabulary – for a given plugin
interface, there is a set of names for which plugins can be found. The designer of a Named Plugin interface is responsible for deciding what the
name means and how to derive it; for example, names of metadata crosswalk plugins may describe the target metadata format. See the getName
dPlugin() method and the getAllPluginNames() methods.
Self-Named Plugins
Named plugins can get their names either from the configuration or, for a variant called self-named plugins, from within the plugin itself.
Self-named plugins are necessary because one plugin implementation can be configured itself to take on many "personalities", each of which deserves its
own plugin name. It is already managing its own configuration for each of these personalities, so it makes sense to allow it to export them to the Plugin
Manager rather than expecting the plugin configuration to be kept in sync with it own configuration.
An example helps clarify the point: There is a named plugin that does crosswalks, call it CrosswalkPlugin. It has several implementations that crosswalk
some kind of metadata. Now we add a new plugin which uses XSL stylesheet transformation (XSLT) to crosswalk many types of metadata – so the single
plugin can act like many different plugins, depending on which stylesheet it employs.
This XSLT-crosswalk plugin has its own configuration that maps a Plugin Name to a stylesheet – it has to, since of course the Plugin Manager doesn't
know anything about stylesheets. It becomes a self-named plugin, so that it reads its configuration data, gets the list of names to which it can respond, and
passes those on to the Plugin Manager.
When the Plugin Service creates an instance of the XSLT-crosswalk, it records the Plugin Name that was responsible for that instance. The plugin can
look at that Name later in order to configure itself correctly for the Name that created it. This mechanism is all part of the SelfNamedPlugin class which is
part of any self-named plugin.
A sequence plugin is returned as an array of _Object_s since it is actually an ordered list of plugins.
Lifecycle Management
When PluginService fulfills a request for a plugin, a new instance is always created.
Getting Meta-Information
The PluginService can list all the names of the Named Plugins which implement an interface. You may need this, for example, to implement a menu in a
user interface that presents a choice among all possible plugins. See the getAllPluginNames() method.
Note that it only returns the plugin name, so if you need a more sophisticated or meaningful "label" (i.e. a key into the I18N message catalog) then you
should add a method to the plugin itself to return that.
Implementation
Note: The PluginService refers to interfaces and classes internally only by their names whenever possible, to avoid loading classes until absolutely
necessary (i.e. to create an instance). As you'll see below, self-named classes still have to be loaded to query them for names, but for the most part it can
avoid loading classes. This saves a lot of time at start-up and keeps the JVM memory footprint down, too. As the Plugin Manager gets used for more
classes, this will become a greater concern.
The only downside of "on-demand" loading is that errors in the configuration don't get discovered right away. The solution is to call the checkConfiguration()
method after making any changes to the configuration.
LegacyPluginServiceImpl Class
The LegacyPluginServiceImpl class is the default PluginService implementation. While it is possible to implement your own version of PluginService, no
other implementations are provided with DSpace
Object getSinglePlugin(Class interfaceClass) - Returns an instance of the singleton (single) plugin implementing the given
interface. There must be exactly one single plugin configured for this interface, otherwise the PluginConfigurationError is thrown. Note that this is
the only "get plugin" method which throws an exception. It is typically used at initialization time to set up a permanent part of the system so any
failure is fatal. See the plugin.single configuration key for configuration details.
Object[] getPluginSequence(Class interfaceClass) - Returns instances of all plugins that implement the interface interfaceClass, in
an Array. Returns an empty array if no there are no matching plugins. The order of the plugins in the array is the same as their class names in the
configuration's value field. See the plugin.sequence configuration key for configuration details.
Object getNamedPlugin(Class interfaceClass, String name) - Returns an instance of a plugin that implements the interface interfa
ceClass and is bound to a name matching name. If there is no matching plugin, it returns null. The names are matched by String.equals(). See
the plugin.named and plugin.selfnamed configuration keys for configuration details.
600
String[] getAllPluginNames(Class interfaceClass) - Returns all of the names under which a named plugin implementing the
interface interfaceClass can be requested (with getNamedPlugin()). The array is empty if there are no matches. Use this to populate a menu of
plugins for interactive selection, or to document what the possible choices are. The names are NOT returned in any predictable order, so you may
wish to sort them first. Note: Since a plugin may be bound to more than one name, the list of names this returns does not represent the list of
plugins. To get the list of unique implementation classes corresponding to the names, you might have to eliminate duplicates (i.e. create a Set of
classes).
SelfNamedPlugin Class
A named plugin implementation must extend this class if it wants to supply its own Plugin Name(s). See Self-Named Plugins for why this is sometimes
necessary.
An error of this type means the caller asked for a single plugin, but either there was no single plugin configured matching that interface, or there was more
than one. Either case causes a fatal configuration error.
This exception indicates a fatal error when instantiating a plugin class. It should only be thrown when something unexpected happens in the course of
instantiating a plugin, e.g. an access error, class not found, etc. Simply not finding a class in the configuration is not an exception.
This is a RuntimeException so it doesn't have to be declared, and can be passed all the way up to a generalized fatal exception handler.
Configuring Plugins
All of the Plugin Service's configuration comes from the DSpace Configuration Service (see Configuration Reference). You can configure these
characteristics of each plugin:
1. Interface: Classname of the Java interface which defines the plugin, including package name. e.g. org.dspace.app.mediafilter.FormatFilter
2. Implementation Class: Classname of the implementation class, including package. e.g. org.dspace.app.mediafilter.PDFFilter
3. Names: (Named plugins only) There are two ways to bind names to plugins: listing them in the value of a plugin.named.interface key, or
configuring a class in plugin.selfnamed.interface which extends the SelfNamedPlugin class.
4. Reusable option: (Optional) This is declared in a plugin.reusable configuration line. Plugins are reusable by default, so you only need to
configure the non-reusable ones.
plugin.single.interface = classname
For example, this configures the class org.dspace.checker.SimpleDispatcher as the plugin for interface org.dspace.checker.BitstreamDispatcher:
plugin.single.org.dspace.checker.BitstreamDispatcher=org.dspace.checker.SimpleDispatcher
601
Configuring Sequence of Plugins
This kind of configuration entry defines a Sequence Plugin, which is bound to a sequence of implementation classes. The key identifies the interface, and
the value is a comma-separated list of classnames:
plugin.sequence.interface = classname, ...
The plugins are returned by getPluginSequence() in the same order as their classes are listed in the configuration value.
For example, this entry configures Stackable Authentication with three implementation classes:
plugin.sequence.org.dspace.eperson.AuthenticationMethod = \
org.dspace.eperson.X509Authentication, \
org.dspace.eperson.PasswordAuthentication, \
edu.mit.dspace.MITSpecialGroup
1. Plugins Named in the Configuration A named plugin which gets its name(s) from the configuration is listed in this kind of entry:_plugin.named.
interface = classname = name [ , name.. ] [ classname = name.. ]_The syntax of the configuration value is: classname, followed by an equal-sign
and then at least one plugin name. Bind more names to the same implementation class by adding them here, separated by commas. Names may
include any character other than comma (,) and equal-sign (=).For example, this entry creates one plugin with the names GIF, JPEG, and image
/png, and another with the name TeX:
plugin.named.org.dspace.app.mediafilter.MediaFilter = \
org.dspace.app.mediafilter.JPEGFilter = GIF, JPEG, image/png \
org.dspace.app.mediafilter.TeXFilter = TeX
This example shows a plugin name with an embedded whitespace character. Since comma (,) is the separator character between plugin names,
spaces are legal (between words of a name; leading and trailing spaces are ignored).This plugin is bound to the names "Adobe PDF", "PDF", and
"Portable Document Format".
plugin.named.org.dspace.app.mediafilter.MediaFilter = \
org.dspace.app.mediafilter.TeXFilter = TeX \
org.dspace.app.mediafilter.PDFFilter = Adobe PDF, PDF, Portable Document Format
NOTE: Since there can only be one key with plugin.named. followed by the interface name in the configuration, all of the plugin implementations
must be configured in that entry.
2. Self-Named Plugins Since a self-named plugin supplies its own names through a static method call, the configuration only has to include its
interface and classname:plugin.selfnamed.interface = classname [ , classname.. ] The following example first demonstrates how the plugin class,
XsltDisseminationCrosswalk is configured to implement its own names "MODS" and "DublinCore". These come from the keys starting with crossw
alk.dissemination.stylesheet.. The value is a stylesheet file. The class is then configured as a self-named plugin:
crosswalk.dissemination.stylesheet.DublinCore = xwalk/TESTDIM-2-DC_copy.xsl
crosswalk.dissemination.stylesheet.MODS = xwalk/mods.xsl
plugin.selfnamed.crosswalk.org.dspace.content.metadata.DisseminationCrosswalk = \
org.dspace.content.metadata.MODSDisseminationCrosswalk, \
org.dspace.content.metadata.XsltDisseminationCrosswalk
NOTE: Since there can only be one key with plugin.selfnamed. followed by the interface name in the configuration, all of the plugin
implementations must be configured in that entry. The MODSDisseminationCrosswalk class is only shown to illustrate this point.
Use Cases
Here are some usage examples to illustrate how the Plugin Service works.
A Singleton Plugin
This shows how to configure and access a single anonymous plugin, such as the BitstreamDispatcher plugin:
Configuration:
602
plugin.single.org.dspace.checker.BitstreamDispatcher=org.dspace.checker.SimpleDispatcher
The following code fragment shows how dispatcher, the service object, is initialized and used:
int id = dispatcher.next();
id = dispatcher.next();
}
Here is the configuration file listing both the plugin's own configuration and the PluginService config line:
crosswalk.dissemination.stylesheet.DublinCore = xwalk/TESTDIM-2-DC_copy.xsl
crosswalk.dissemination.stylesheet.MODS = xwalk/mods.xsl
plugin.selfnamed.org.dspace.content.metadata.DisseminationCrosswalk = \
org.dspace.content.metadata.XsltDisseminationCrosswalk
This look into the implementation shows how it finds configuration entries to populate the array of plugin names returned by the getPluginNames() method.
Also note, in the getStylesheet() method, how it uses the plugin name that created the current instance (returned by getPluginInstanceName()) to find the
correct stylesheet.
while (pe.hasMoreElements())
{
String key = (String)pe.nextElement();
if (key.startsWith(prefix))
aliasList.add(key.substring(prefix.length()));
}
return (String[])aliasList.toArray(new
String[aliasList.size()]);
}
Stackable Authentication
603
The Stackable Authentication mechanism needs to know all of the plugins configured for the interface, in the order of configuration, since order is
significant. It gets a Sequence Plugin from the Plugin Manager. Refer to the Configuration Section on Stackable Authentication for further details.
Workflow System
The primary classes are:
org.dspace.workflow. responds to events, manages the WorkflowItem states. There are two implementations, the traditional, default workflow
WorkflowService (described below) and Configurable Workflow.
org.dspace.eperson.Group people who can perform workflow tasks are defined in EPerson Groups
The default workflow system models the states of an Item in a state machine with 5 states (SUBMIT, STEP_1, STEP_2, STEP_3, ARCHIVE.) These are
the three optional steps where the item can be viewed and corrected by different groups of people. Actually, it's more like 8 states, with STEP_1_POOL,
STEP_2_POOL, and STEP_3_POOL. These pooled states are when items are waiting to enter the primary states. Optionally, you can also choose to
enable the enhanced, Configurable Workflow, if you wish to have more control over your workflow steps/states. (Note: the remainder of this description
relates to the traditional, default workflow. For more information on the Configurable Workflow option, visit Configurable Workflow .)
The WorkflowService is invoked by events. While an Item is being submitted, it is held by a WorkspaceItem. Calling the start() method in the
WorkflowService converts a WorkspaceItem to a WorkflowItem, and begins processing the WorkflowItem's state. Since all three steps of the workflow are
optional, if no steps are defined, then the Item is simply archived.
Workflows are set per Collection, and steps are defined by creating corresponding entries in the List named workflowGroup. If you wish the workflow to
have a step 1, use the administration tools for Collections to create a workflow Group with members who you want to be able to view and approve the
Item, and the workflowGroup[0] becomes set with the ID of that Group.
If a step is defined in a Collection's workflow, then the WorkflowItem's state is set to that step_POOL. This pooled state is the WorkflowItem waiting for an
EPerson in that group to claim the step's task for that WorkflowItem. The WorkflowManager emails the members of that Group notifying them that there is
a task to be performed (the text is defined in config/emails,) and when an EPerson goes to their 'My DSpace' page to claim the task, the WorkflowManager
is invoked with a claim event, and the WorkflowItem's state advances from STEP_x_POOL to STEP_x (where x is the corresponding step.) The EPerson
can also generate an 'unclaim' event, returning the WorkflowItem to the STEP_x_POOL.
Other events the WorkflowService handles are advance(), which advances the WorkflowItem to the next state. If there are no further states, then the
WorkflowItem is removed, and the Item is then archived. An EPerson performing one of the tasks can reject the Item, which stops the workflow, rebuilds
the WorkspaceItem for it and sends a rejection note to the submitter. More drastically, an abort() event is generated by the admin tools to cancel a
workflow outright.
Administration Toolkit
The org.dspace.administer package contains some classes for administering a DSpace system that are not generally needed by most applications.
The CreateAdministrator class is a simple command-line tool, executed via [dspace]/bin/dspace create-administrator, that creates an
administrator e-person with information entered from standard input. This is generally used only once when a DSpace system is initially installed, to create
an initial administrator who can then use the Web administration UI to further set up the system. This script does not check for authorization, since it is
typically run before there are any e-people to authorize! Since it must be run as a command-line tool on the server machine, generally this shouldn't cause
a problem. A possibility is to have the script only operate when there are no e-people in the system already, though in general, someone with access to
command-line scripts on your server is probably in a position to do what they want anyway!
The DCType class is similar to the org.dspace.content.BitstreamFormat class. It represents an entry in the Dublin Core type registry, that is, a particular
element and qualifier, or unqualified element. It is in the administer package because it is only generally required when manipulating the registry itself.
Elements and qualifiers are specified as literals in org.dspace.content.Item methods and the org.dspace.content.Metadatum class. Only administrators
may modify the Dublin Core type registry.
The org.dspace.administer.RegistryLoader class contains methods for initializing the Dublin Core type registry and bitstream format registry with entries in
an XML file. Typically this is executed via the command line during the build process (see build.xml in the source.) To see examples of the XML formats,
see the files in config/registries in the source directory. There is no XML schema, they aren't validated strictly when loaded in.
E-person/Group Manager
DSpace keeps track of registered users with the org.dspace.eperson.EPerson class. The class has methods to create and manipulate an EPerson such as
get and set methods for first and last names, email, and password. (Actually, there is no getPassword() method‚ an MD5 hash of the password is stored,
and can only be verified with the checkPassword() method.) There are find methods to find an EPerson by email (which is assumed to be unique,) or to
find all EPeople in the system.
604
The EPerson object should probably be reworked to allow for easy expansion; the current EPerson object tracks pretty much only what MIT was interested
in tracking - first and last names, email, phone. The access methods are hardcoded and should probably be replaced with methods to access arbitrary
name/value pairs for institutions that wish to customize what EPerson information is stored.
Groups are simply lists of EPerson objects. Other than membership, Group objects have only one other attribute: a name. Group names must be unique,
so (for groups associated with workflows) we have adopted naming conventions where the role of the group is its name, such as COLLECTION_100_ADD.
Groups add and remove EPerson objects with addMember() and removeMember() methods. One important thing to know about groups is that they store
their membership in memory until the update() method is called - so when modifying a group's membership don't forget to invoke update() or your changes
will be lost! Since group membership is used heavily by the authorization system a fast isMember() method is also provided.
Two specific groups are created when DSpace is installed: Administrator (which can bypass authorization) and Anonymous (which is assigned to all
sessions that are not logged in). The code expects these groups to exist. They cannot be renamed or deleted.
Another kind of Group is also implemented in DSpace‚ special Groups. The Context object for each session carries around a List of Group IDs that the
user is also a member of‚ currently the MITUser Group ID is added to the list of a user's special groups if certain IP address or certificate criteria are met.
Authorization
The primary classes are:
The authorization system is based on the classic 'police state' model of security; no action is allowed unless it is expressed in a policy. The policies are
attached to resources (hence the name ResourcePolicy,) and detail who can perform that action. The resource can be any of the DSpace object types,
listed in org.dspace.core.Constants (BITSTREAM, ITEM, COLLECTION, etc.) The 'who' is made up of EPerson groups. The actions are also in Constants.
java (READ, WRITE, ADD, etc.) The only non-obvious actions are ADD and REMOVE, which are authorizations for container objects. To be able to create
an Item, you must have ADD permission in a Collection, which contains Items. (Communities, Collections, Items, and Bundles are all container objects.)
Currently most of the read policy checking is done with items‚ communities and collections are assumed to be openly readable, but items and their
bitstreams are checked. Separate policy checks for items and their bitstreams enables policies that allow publicly readable items, but parts of their content
may be restricted to certain groups.
Three new attributes have been introduced in the ResourcePolicy class as part of the DSpace Embargo Contribution:
While rpname and rpdescription _are fields manageable by the users the _rptype is a fields managed by the system. It represents a type that a resource
policy can assume beteween the following:
TYPE_SUBMISSION: all the policies added automatically during the submission process
TYPE_WORKFLOW: all the policies added automatically during the workflow stage
TYPE_CUSTOM: all the custom policies added by users
TYPE_INHERITED: all the policies inherited by the DSO father.
An custom policy, created for the purpose of creating an embargo could look like:
policy_id: 4847
resource_type_id: 2
resource_id: 89
action_id: 0
eperson_id:
epersongroup_id: 0
start_date: 2013-01-01
end_date:
rpname: Embargo Policy
rpdescription: Embargoed through 2012
rptype: TYPE_CUSTOM
ResourcePolicies are very simple, and there are quite a lot of them. Each can only list a single group, a single action, and a single object. So each object
will likely have several policies, and if multiple groups share permissions for actions on an object, each group will get its own policy. (It's a good thing
they're small.)
605
Special Groups
All users are assumed to be part of the public group (ID=0.) DSpace admins (ID=1) are automatically part of all groups, much like super-users in the Unix
OS. The Context object also carries around a List of special groups, which are also first checked for membership. These special groups are used at MIT to
indicate membership in the MIT community, something that is very difficult to enumerate in the database! When a user logs in with an MIT certificate or
with an MIT IP address, the login code adds this MIT user group to the user's Context.
Handles are stored internally in the handle database table in the form:
1721.123/4567
Typically when they are used outside of the system they are displayed in either URI or "URL proxy" forms:
hdl:1721.123/4567
http://hdl.handle.net/1721.123/4567
It is the responsibility of the caller to extract the basic form from whichever displayed form is used.
The handle table maps these Handles to resource type/resource ID pairs, where resource type is a value from org.dspace.core.Constants and resource ID
is the internal identifier (database primary key) of the object. This allows Handles to be assigned to any type of object in the system, though as explained in
the functional overview, only communities, collections and items are presently assigned Handles.
Creating a Handle
Finding the Handle for a DSpaceObject, though this is usually only invoked by the object itself, since DSpaceObject has a getHandle method
Retrieving the DSpaceObject identified by a particular Handle
Obtaining displayable forms of the Handle (URI or "proxy URL").
HandlePlugin is a simple implementation of the Handle Server's net.handle.hdllib.HandleStorage interface. It only implements the basic Handle
retrieval methods, which get information from the handle database table. The CNRI Handle Server is configured to use this plug-in via its config.
dct file.
Note that since the Handle server runs as a separate JVM to the DSpace Web applications, it uses a separate 'Log4J' configuration, since Log4J does not
support multiple JVMs using the same daily rolling logs. This alternative configuration is located at [dspace]/config/log4j-handle-plugin.
properties. The [dspace]/bin/start-handle-server script passes in the appropriate command line parameters so that the Handle server uses
this configuration.
In additional to Handles, DSpace also provides basic support for DOIs (Digital Object Identifiers). For more information visit DOI Digital Object Identifier.
Search
DSpace's search code is a simple, configurable API which currently wraps Apache Solr. See Discovery for more information on how to customize the
default search settings, etc.
Harvesting API
The org.dspace.search package also provides a 'harvesting' API. This allows callers to extract information about items modified within a particular
timeframe, and within a particular scope (all of DSpace, or a community or collection.) Currently this is used by the Open Archives Initiative metadata
harvesting protocol application, and the e-mail subscription code.
The Harvest.harvest is invoked with the required scope and start and end dates. Either date can be omitted. The dates should be in the ISO8601, UTC
time zone format used elsewhere in the DSpace system.
HarvestedItemInfo objects are returned. These objects are simple containers with basic information about the items falling within the given scope and date
range. Depending on parameters passed to the harvest method, the containers and item fields may have been filled out with the IDs of communities and
collections containing an item, and the corresponding Item object respectively. Electing not to have these fields filled out means the harvest operation
executes considerable faster.
In case it is required, Harvest also offers a method for creating a single HarvestedItemInfo object, which might make things easier for the caller.
Browse API
606
The browse API uses the same underlying technology as the Search API (Apache Solr, see also Discovery). It maintains indexes of dates, authors, titles
and subjects, and allows callers to extract parts of these:
Title: Values of the Dublin Core element title (unqualified) are indexed. These are sorted in a case-insensitive fashion, with any leading article
removed. For example: "The DSpace System" would appear under 'D' rather than 'T'.
Author: Values of the contributor (any qualifier or unqualified) element are indexed. Since contributor values typically are in the form 'last name,
first name', a simple case-insensitive alphanumeric sort is used which orders authors in last name order. Note that this is an index of authors, and
not items by author. If four items have the same author, that author will appear in the index only once. Hence, the index of authors may be greater
or smaller than the index of titles; items often have more than one author, though the same author may have authored several items. The author
indexing in the browse API does have limitations:
Ideally, a name that appears as an author for more than one item would appear in the author index only once. For example, 'Doe, John'
may be the author of tens of items. However, in practice, author's names often appear in slightly differently forms, for example:
Doe, John
Doe, John Stewart
Doe, John S.
Currently, the above three names would all appear as separate entries in the author index even though they may refer to the same
author. In order for an author of several papers to be correctly appear once in the index, each item must specify exactly the same form of
their name, which doesn't always happen in practice.
Another issue is that two authors may have the same name, even within a single institution. If this is the case they may appear as one
author in the index. These issues are typically resolved in libraries with authority control records, in which are kept a 'preferred' form of
the author's name, with extra information (such as date of birth/death) in order to distinguish between authors of the same name.
Maintaining such records is a huge task with many issues, particularly when metadata is received from faculty directly rather than trained
library catalogers.
Date of Issue: Items are indexed by date of issue. This may be different from the date that an item appeared in DSpace; many items may have
been originally published elsewhere beforehand. The Dublin Core field used is date.issued. The ordering of this index may be reversed so
'earliest first' and 'most recent first' orderings are possible. Note that the index is of items by date, as opposed to an index of dates. If 30 items
have the same issue date (say 2002), then those 30 items all appear in the index adjacent to each other, as opposed to a single 2002 entry.
Since dates in DSpace Dublin Core are in ISO8601, all in the UTC time zone, a simple alphanumeric sort is sufficient to sort by date, including
dealing with varying granularities of date reasonably. For example:
2001-12-10
2002
2002-04
2002-04-05
2002-04-09T15:34:12Z
2002-04-09T19:21:12Z
2002-04-10
Date Accessioned: In order to determine which items most recently appeared, rather than using the date of issue, an item's accession date is
used. This is the Dublin Core field date.accessioned. In other aspects this index is identical to the date of issue index.
Items by a Particular Author: The browse API can perform is to extract items by a particular author. They do not have to be primary author of an
item for that item to be extracted. You can specify a scope, too; that is, you can ask for items by author X in collection Y, for example.This
particular flavor of browse is slightly simpler than the others. You cannot presently specify a particular subset of results to be returned. The API
call will simply return all of the items by a particular author within a certain scope. Note that the author of the item must exactly match the author
passed in to the API; see the explanation about the caveats of the author index browsing to see why this is the case.
Subject: Values of the Dublin Core element subject (both unqualified and with any qualifier) are indexed. These are sorted in a case-insensitive
fashion.
The results of invoking Browse.getItemsByTitle with the above parameters might look like this:
607
Rabble-Rousing Rabbis From Sardinia
Reality TV: Love It or Hate It?
FOCUS> The Really Exciting Research Video
Recreational Housework Addicts: Please Visit My House
Regional Television Variation Studies
Revenue Streams
Ridiculous Example Titles: I'm Out of Ideas
Note that in the case of title and date browses, Item objects are returned as opposed to actual titles. In these cases, you can specify the 'focus' to be a
specific item, or a partial or full literal value. In the case of a literal value, if no entry in the index matches exactly, the closest match is used as the focus.
It's quite reasonable to specify a focus of a single letter, for example.
Being able to specify a specific item to start at is particularly important with dates, since many items may have the save issue date. Say 30 items in a
collection have the issue date 2002. To be able to page through the index 20 items at a time, you need to be able to specify exactly which item's 2002 is
the focus of the browse, otherwise each time you invoked the browse code, the results would start at the first item with the issue date 2002.
Author browses return String objects with the actual author names. You can only specify the focus as a full or partial literal String.
Another important point to note is that presently, the browse indexes contain metadata for all items in the main archive, regardless of authorization policies.
This means that all items in the archive will appear to all users when browsing. Of course, should the user attempt to access a non-public item, the usual
authorization mechanism will apply. Whether this approach is ideal is under review; implementing the browse API such that the results retrieved reflect a
user's level of authorization may be possible, but rather tricky.
Checksum checker
Checksum checker is used to verify every item within DSpace. While DSpace calculates and records the checksum of every file submitted to it, the
checker can determine whether the file has been changed. The idea being that the earlier you can identify a file has changed, the more likely you would be
able to record it (assuming it was not a wanted change).
org.dspace.checker.CheckerCommand class, is the class for the checksum checker tool, which calculates checksums for each bitstream whose ID is
in the most_recent_checksum table, and compares it against the last calculated checksum for that bitstream.
OpenSearch Support
DSpace is able to support the OpenSearch protocol. For those not acquainted with the standard, a very brief introduction, with emphasis on what
possibilities it holds for current use and future development.
OpenSearch is a small set of conventions and documents for describing and using 'search engines', meaning any service that returns a set of results for a
query. It is nearly ubiquitous‚ but also nearly invisible‚ in modern web sites with search capability. If you look at the page source of Wikipedia, Facebook,
CNN, etc you will find buried a link element declaring OpenSearch support. It is very much a lowest-common-denominator abstraction (think Google box),
but does provide a means to extend its expressive power. This first implementation for DSpace supports none of these extensions‚ many of which are of
potential value‚ so it should be regarded as a foundation, not a finished solution. So the short answer is that DSpace appears as a 'search-engine' to
OpenSearch-aware software.
Another way to look at OpenSearch is as a RESTful web service for search, very much like SRW/U, but considerably simpler. This comparative loss of
power is offset by the fact that it is widely supported by web tools and players: browsers understand it, as do large metasearch tools.
Browser Integration: Many recent browsers (IE7+, FF2+) can detect, or 'autodiscover', links to the document describing the search engine. Thus
you can easily add your or other DSpace instances to the drop-down list of search engines in your browser. This list typically appears in the upper
right corner of the browser, with a search box. In Firefox, for example, when you visit a site supporting OpenSearch, the color of the drop-down
list widget changes color, and if you open it to show the list of search engines, you are offered an opportunity to add the site to the list. IE works
nearly the same way but instead labels the web sites 'search providers'. When you select a DSpace instance as the search engine and enter a
search, you are simply sent to the regular search results page of the instance.
Flexible, interesting RSS Feeds. Because one of the formats that OpenSearch specifies for its results is RSS (or Atom), you can turn any search
query into an RSS feed. So if there are keywords highly discriminative of content in a collection or repository, these can be turned into a URL that
a feed reader can subscribe to. Taken to the extreme, one could take any search a user makes, and dynamically compose an RSS feed URL for
it in the page of returned results. To see an example, if you have a DSpace with OpenSearch enabled, try:
The default format returned is Atom 1.0, so you should see an Atom document containing your search results.
You can extend the syntax with a few other parameters, as follows:
Parameter Values
608
scope UUID of a collection or community to restrict the search to
rpp number indicating the number of results per page (i.e. per request)
sort metadata field to sort by, for example "dc.title" (see webui.itemlist.sort-option in dspace.cfg)
Multiple parameters may be specified on the query string, using the "&" character as the delimiter, e.g.:
https://demo.dspace.org/server/opensearch/search?query=<your query>&format=rss&scope=40250cb0-09d3-4b22-
b5c5-c39bc815f6ea
Cheap metasearchSearch aggregators like A9 (Amazon) recognize OpenSearch-compliant providers, and so can be added to metasearch sets
using their UIs. Then you site can be used to aggregate search results with others.
When OpenSearch is enabled in DSpace, informational "link" tags will be embedded into the HTML of every page. These link tags will allow tools
to easily discover the OpenSearch Service Document (`/service`) and RSS / Atom feeds. The links appear in the HTML head tag and look like
this:
<link href="https://demo.dspace.org/server/opensearch/search?format=atom&query=*" type="application
/atom+xml" rel="alternate" title="Sitewide Atom feed">
<link href="https://demo.dspace.org/server/opensearch/search?format=rss&query=*" type="application
/rss+xml" rel="alternate" title="Sitewide RSS feed">
<link href="https://demo.dspace.org/server/opensearch/search/service" type="application/atom+xml" rel="
search" title="DSpace">
Configuration is through the dspace.cfg file. See OpenSearch Support for more details.
Embargo Support
What is an Embargo?
An embargo is a temporary access restriction placed on content, commencing at time of accession. It's scope or duration may vary, but the fact that it
eventually expires is what distinguishes it from other content restrictions. For example, it is not unusual for content destined for DSpace to come with
permanent restrictions on use or access based on license-driven or other IP-based requirements that limit access to institutionally affiliated users.
Restrictions such as these are imposed and managed using standard administrative tools in DSpace, typically by attaching specific policies to Items or
Collections, Bitstreams, etc. The embargo functionally introduced in 1.6, however, includes tools to automate the imposition and removal of restrictions in
managed timeframes.
1. Terms Assignment. The first step in placing an embargo on an item is to attach (assign) 'terms' to it. If these terms are missing, no embargo will
be imposed. As we will see below, terms are carried in a configurable DSpace metadata field, so assigning terms just means assigning a value to
a metadata field. This can be done in a web submission user interface form, in a SWORD deposit package, a batch import, etc. - anywhere
metadata is passed to DSpace. The terms are not immediately acted upon, and may be revised, corrected, removed, etc, up until the next stage
of the life-cycle. Thus a submitter could enter one value, and a collection editor replace it, and only the last value will be used. Since metadata
fields are multivalued, theoretically there can be multiple terms values, but in the default implementation only one is recognized.
2. Terms interpretation/imposition. In DSpace terminology, when an item has exited the last of any workflow steps (or if none have been defined
for it), it is said to be 'installed' into the repository. At this precise time, the 'interpretation' of the terms occurs, and a computed 'lift date' is
assigned, which like the terms is recorded in a configurable metadata field. It is important to understand that this interpretation happens only
once, (just like the installation), and cannot be revisited later. Thus, although an administrator can assign a new value to the metadata field
holding the terms after the item has been installed, this will have no effect on the embargo, whose 'force' now resides entirely in the 'lift date'
value. For this reason, you cannot embargo content already in your repository (at least using standard tools). The other action taken at installation
time is the actual imposition of the embargo. The default behavior here is simply to remove the read policies on all the bundles and bitstreams
except for the "LICENSE" or "METADATA" bundles. See the section on Extending Embargo Functionality for how to alter this behavior. Also note
609
that since these policy changes occur before installation, there is no time during which embargoed content is 'exposed' (accessible by non-
administrators). The terms interpretation and imposition together are called 'setting' the embargo, and the component that performs them both is
called the embargo 'setter'.
3. Embargo Period. After an embargoed item has been installed, the policy restrictions remain in effect until removed. This is not an automatic
process, however: a 'lifter' must be run periodically to look for items whose 'lift date' is past. Note that this means the effective removal of an
embargo is not the lift date, but the earliest date after the lift date that the lifter is run. Typically, a nightly cron-scheduled invocation of the lifter is
more than adequate, given the granularity of embargo terms. Also note that during the embargo period, all metadata of the item remains visible.
This default behavior can be changed. One final point to note is that the 'lift date', although it was computed and assigned during the previous
stage, is in the end a regular metadata field. That means, if there are extraordinary circumstances that require an administrator (or collection editor
‚ anyone with edit permissions on metadata) to change the lift date, they can do so. Thus, they can 'revise' the lift date without reference to the
original terms. This date will be checked the next time the 'lifter' is run. One could immediately lift the embargo by setting the lift date to the current
day, or change it to 'forever' to indefinitely postpone lifting.
4. Embargo Lift. When the lifter discovers an item whose lift date is in the past, it removes (lifts) the embargo. The default behavior of the lifter is to
add the resource policies that would have been added had the embargo not been imposed. That is, it replicates the standard DSpace behavior, in
which an item inherits it's policies from its owning collection. As with all other parts of the embargo system, you may replace or extend the default
behavior of the lifter (see section V. below). You may wish, e.g. to send an email to an administrator or other interested parties, when an
embargoed item becomes available.
5. Post Embargo. After the embargo has been lifted, the item ceases to respond to any of the embargo life-cycle events. The values of the
metadata fields reflect essentially historical or provenance values. With the exception of the additional metadata fields, they are indistinguishable
from items that were never subject to embargo.
More details on Embargo configuration, including specific examples can be found in the Embargo section of the documentation.
610
DSpace Services Framework
1 Architectural Overview
1.1 DSpace Kernel
1.1.1 Kernel registration
1.2 Service Manager
2 Basic Usage
2.1 Standalone Applications
2.2 Application Frameworks (Spring, Guice, etc.)
2.3 Web Applications
3 Providers and Plugins
3.1 Activators
3.2 Provider Stacks
4 Core Services
4.1 Caching Service
4.2 Configuration Service
4.3 EventService
4.4 RequestService
4.5 SessionService
5 Examples
5.1 Configuring Event Listeners
6 Tutorials
The DSpace Services Framework is a backporting of the DSpace 2.0 Development Group's work in creating a reasonable and abstractable "Core
Services" layer for DSpace components to operate within. The Services Framework represents a "best practice" for new DSpace architecture and
implementation of extensions to the DSpace application. DSpace Services are best described as a "Simple Registry" where plugins can be "looked up" or
located. The DS2 (DSpace 2.0) core services are the main services that make up a DS2 system. These includes services for things like user and
permissions management and storage and caching. These services can be used by any developer writing DS2 plugins (e.g. statistics), providers (e.g.
authentication), or user interfaces.
Architectural Overview
DSpace Kernel
The DSpace Kernel manages the start up and access services in the DSpace Services framework. It is meant to allow for a simple way to control the core
parts of DSpace and allow for flexible ways to startup the kernel. For example, the kernel can be run inside a single webapp along with a frontend UI or it
can be started as part of the servlet container so that multiple webapps can use a single kernel (this increases speed and efficiency). The kernel is also
designed to happily allow multiple kernels to run in a single servlet container using identifier keys.
Kernel registration
The kernel will automatically register itself as an MBean when it starts up so that it can be managed via JMX. It allows startup and shutdown and provides
direct access to the ServiceManager and the ConfigurationService. All the other core services can be retrieved from the ServiceManager by their APIs.
Service Manager
611
The ServiceManager abstracts the concepts of service lookups and lifecycle control. It also manages the configuration of services by allowing properties to
be pushed into the services as they start up (mostly from the ConfigurationService). The ServiceManagerSystem abstraction allows the DSpace
ServiceManager to use different systems to manage its services. The current implementations include Spring and Guice. This allows DSpace 2 to have
very little service management code but still be flexible and not tied to specific technology. Developers who are comfortable with those technologies can
consume the services from a parent Spring ApplicationContext or a parent Guice Module. The abstraction also means that we can replace Spring/Guice or
add other dependency injection systems later without requiring developers to change their code. The interface provides simple methods for looking up
services by interface type for developers who do not want to have to use or learn a dependency injection system or are using one which is not currently
supported.
The DS2 kernel is compact so it can be completely started up in a unit test (technically integration test) environment. (This is how we test the kernel and
core services currently). This allows developers to execute code against a fully functional kernel while developing and then deploy their code with high
confidence.
Basic Usage
To use the Framework you must begin by instantiating and starting a DSpaceKernel. The kernel will give you references to the ServiceManager and the
ConfigurationService. The ServiceManager can be used to get references to other services and to register services which are not part of the core set.
Access to the kernel is provided via the Kernel Manager through the DSpace object, which will locate the kernel object and allow it to be used.
Standalone Applications
For standalone applications, access to the kernel is provided via the Kernel Manager and the DSpace object which will locate the kernel object and allow it
to be used.
bin/dspace
612
Application Frameworks (Spring, Guice, etc.)
Similar to Standalone Applications, but you can use your framework to instantiate an org.dspace.utils.DSpace object.
Web Applications
In web applications, the kernel can be started and accessed through the use of Servlet Filter/ContextListeners which are provided as part of the DSpace 2
utilities. Developers don't need to understand what is going on behind the scenes and can simply write their applications and package them as webapps
and take advantage of the services which are offered by DSpace 2.
Activators
Developers can provide an activator to allow the system to startup their service or provider. It is a simple interface with 2 methods which are called by the
ServiceManager to startup the provider(s) and later to shut them down. These simply allow a developer to run some arbitrary code in order to create and
register services if desired. It is the method provided to add plugins directly to the system via configuration as the activators are just listed in the
configuration file and the system starts them up in the order it finds them.
Provider Stacks
Utilities are provided to assist with stacking and ordering providers. Ordering is handled via a priority number such that 1 is the highest priority and
something like 10 would be lower. 0 indicates that priority is not important for this service and can be used to ensure the provider is placed at or near the
end without having to set some arbitrarily high number.
Core Services
The core services are all behind APIs so that they can be reimplemented without affecting developers who are using the services. Most of the services
have plugin/provider points so that customizations can be added into the system without touching the core services code. For example, let's say a deployer
has a specialized authentication system and wants to manage the authentication calls which come into the system. The implementor can simply implement
an AuthenticationProvider and then register it with the DS2 kernel's ServiceManager. This can be done at any time and does not have to be done during
Kernel startup. This allows providers to be swapped out at runtime without disrupting the DS2 service if desired. It can also speed up development by
allowing quick hot redeploys of code during development.
Caching Service
Provides for a centralized way to handle caching in the system and thus a single point for configuration and control over all caches in the system. Provider
and plugin developers are strongly encouraged to use this rather than implementing their own caching. The caching service has the concept of scopes so
even storing data in maps or lists is discouraged unless there are good reasons to do so.
Configuration Service
The ConfigurationService controls the external and internal configuration of DSpace 2. It reads Properties files when the kernel starts up and merges them
with any dynamic configuration data which is available from the services. This service allows settings to be updated as the system is running, and also
defines listeners which allow services to know when their configuration settings have changed and take action if desired. It is the central point to access
and manage all the configuration settings in DSpace 2.
Manages the configuration of the DSpace 2 system. Can be used to manage configuration for providers and plugins also.
EventService
Handles events and provides access to listeners for consumption of events.
RequestService
613
In DS2 a request is an atomic transaction in the system. It is likely to be an HTTP request in many cases but it does not have to be. This service provides
the core services with a way to manage atomic transactions so that when a request comes in which requires multiple things to happen they can either all
succeed or all fail without each service attempting to manage this independently. In a nutshell this simply allows identification of the current request and
the ability to discover if it succeeded or failed when it ends. Nothing in the system will enforce usage of the service, but we encourage developers who are
interacting with the system to make use of this service so they know if the request they are participating in with has succeeded or failed and can take
appropriate actions.
SessionService
In DS2 a session is like an HttpSession (and generally is actually one) so this service is here to allow developers to find information about the current
session and to access information in it. The session identifies the current user (if authenticated) so it also serves as a way to track user sessions. Since we
use HttpSession directly it is easy to mirror sessions across multiple servers in order to allow for no-interruption failover for users when servers go offline.
Examples
In Spring:
<bean id="dspace.eventService"
factory-bean="dspace"
factory-method="getEventService"/>
<bean class="org.my.EventListener">
<property name="eventService" >
<ref bean="dspace.eventService"/>
</property>
</bean>
</beans>
(org.my.EventListener will need to register itself with the EventService, for which it is passed a reference to that service via the eventService property.)
or in Java:
(This registers the listener externally – the listener code assumes it is registered.)
Tutorials
Several tutorials on Spring / DSpace Services are available:
614
Storage Layer
In this section, we explain the storage layer: the database structure, maintenance, and the bitstream store and configurations. The bitstream store, also
known as assetstore or bitstore, holds the uploaded, ingested, or generated files (documents, images, audio, video, datasets, ...), where as the database
holds all of the metadata, organization, and permissions of content.
DSpace 6 database schema (Postgres). Right-click the image and choose "Save as" to save in full resolution. Instructions on updating this schema
diagram are in How to update database schema diagram.
DSpace uses FlywayDB to perform automated database initialization and upgrades. Flyway's role is to initialize the database tables (and default
content) prior to Hibernate initialization.
The org.dspace.storage.rdbms.DatabaseUtils class manages all Flyway API calls, and executes the SQL migrations under the
org.dspace.storage.rdbms.sqlmigration package and the Java migrations under the org.dspace.storage.rdbms.
migration package.
Once all database migrations have run, a series of Flyway Callbacks are triggered to initialize the (empty) database with required default
content. For example, callbacks exist for adding default DSpace Groups (GroupServiceInitializer), default Metadata & Format
Registries (DatabaseRegistryUpdater), and the default Site object (SiteServiceInitializer). All Callbacks are under the org.
dspace.storage.rdbms package.
While Flyway is automatically initialized and executed during startup, various Database Utilities are also available on the command
line. These utilities allow you to manually trigger database upgrades or check the status of your database.
DSpace uses Hibernate ORM as the object relational mapping layer between the DSpace database and the DSpace code.
The main Hibernate configuration can be found at [dspace]/config/hibernate.cfg.xml
Hibernate initialization is triggered via Spring (beans) defined [dspace]/config/spring/api/core-hibernate.xml. This Spring
configuration pulls in some settings from DSpace Configuration, namely all Database (db.*) settings defined there.
615
All DSpace Object Classes provide a DAO (Data Access Object) implementation class that extends a GenericDAO interface defined in o
rg.dspace.core.GenericDAO class. The default (abstract) implementation is in org.dspace.core.AbstractHibernateDAO cla
ss.
The DSpace Context object (org.dspace.core.Context) provides access to the configured org.dspace.core.DBConnection (D
atabase Connection), which is HibernateDBConnection by default. The org.dspace.core.HibernateDBConnection class
provides access the the Hibernate Session interface (org.hibernate.Session) and its Transactions.
Each Hibernate Session opens a single database connection when it is created, and holds onto it until the Session is closed. A
Session may consist of one or more Transactions. Sessions are NOT thread-safe (so individual objects cannot be shared
between threads).
Hibernate will intelligently cache objects in the current Hibernate Session (on object access), allowing for optimized
performance.
DSpace provides methods on the Context object to specifically remove (Context.uncacheEntity()) or reload (Context.
reloadEntity()) objects within Hibernate's Session cache.
DSpace also provides special Context object "modes" to optimize Hibernate performance for read-only access (Mode.
READ_ONLY) or batch processing (Mode.BATCH_EDIT). These modes can be specified when constructing a new Context
object.
Most of the functionality that DSpace uses can be offered by any standard SQL database that supports transactions. However, at this time, DSpace only
provides Flyway migration scripts for PostgreSQL and Oracle (and has only been tested with those database backends). Additional database backends
should be possible, but would minimally require creating custom Flyway migration scripts for that database backend.
Backups: The DSpace database can be backed up and restored using usual PostgreSQL Backup and Restore methods, for example with pg_dump and p
sql. However when restoring a database, you will need to perform these additional steps:
After restoring a backup, you will need to reset the primary key generation sequences so that they do not produce already-used primary keys. Do
this by executing the SQL in [dspace]/etc/postgres/update-sequences.sql, for example with:
db.url The JDBC URL to use for accessing the database. This should not point to a connection pool, since DSpace already implements a
connection pool.
db.driver JDBC driver class name. Since presently, DSpace uses PostgreSQL-specific features, this should be org.postgresql.Driver.
That being said, if you absolutely need to customize your database tables, columns or views, it is possible to create custom Flyway migration scripts ,
which should make your customizations easier to manage in future upgrades. (Keep in mind though, that you may still need to maintain/update your
custom Flyway migration scripts if they ever conflict directly with future DSpace database changes. The only way to "future proof" your local database
changes is to try and make them as independent as possible, and avoid directly modifying the DSpace database schema as much as possible.)
If you wish to add custom Flyway migrations, they may be added to the following locations:
616
Custom Flyway SQL migrations may be added anywhere under the org.dspace.storage.rdbms.sqlmigration package (e.g. [src]
/dspace-api/src/main/resources/org/dspace/storage/rdbms/sqlmigration or subdirectories)
Custom Flyway Java migrations may be added anywhere under the org.dspace.storage.rdbms.migration package (e.g. [src]
/dspace-api/src/main/java/org/dspace/storage/rdbms/migration/ or subdirectories)
Additionally, for backwards support, custom SQL migrations may also be placed in the [dspace]/etc/[db-type]/ folder (e.g. [dspace]/etc
/postgres/ for a PostgreSQL specific migration script)
Adding Flyway migrations to any of the above location will cause Flyway to auto-discover the migration. It will be run in the order in which it is named. Our
DSpace Flyway script naming convention follows Flyway best practices and is as follows:
Bitstream Store
DSpace offers two means for storing content.
Both are achieved using a simple, lightweight BitStore API, providing actions of Get, Put, About, Remove. Higher level operations include Store, Register,
Checksum, Retrieve, Cleanup, Clone, Migrate. Digital assets are stored on the bitstores by being transferred to the bitstore when it is uploaded or
ingested. The exception to this is for "registered" objects, that the assets are put onto the filesystem ahead of time out-of-band, and during ingest, it just
maps the database to know where the object already resides. The storage interface is such that additional storage implementations (i.e. other cloud
storage providers) can be added with minimal difficulty.
DSBitStore stores content on a path on the filesystem. This could be locally attached normal filesystem, a mounted drive, or a mounted networked
filesystem, it will all be treated as a local filesystem. All DSpace needs to be configured with for a filesystem, is the filesystem path, i.e. /dspace/assetstore,
/opt/data/assetstore. The DSBitStore uses a "Directory Scatter" method of storing an asset within 3 levels of subfolders, to minimize any single folder
having too many objects for normal filesystem performance.
S3BitStore uses Amazon Web Services S3 (Simple Storage Service) to offer limitless cloud storage into a bucket, and each distinct asset will have a
unique key. S3 is a commercial service (costs money), but is available at low price point, and is fully managed, content is automatically
replicated, 99.999999999% object durability, integrity checked. Since S3 operates within the AWS network, using other AWS services, such virtual server
on EC2 will provide lower network latency than local "on premises" servers. Additionally there could be some in-bound / out-bound bandwidth costs
associated with DSpace application server outside of the AWS network communicating with S3, compared to AWS-internal EC2 servers. S3 has a
checksum computing operation, in which the S3 service can return the checksum from the storage service, without having to shuttle the bits from S3, to
your application server, and then computing the checksum. S3BitStore requires an S3 bucketName, accessKey, secretKey, and optionally specifying the
AWS region, or a subfolder within the bucket.
There can be multiple bitstream stores. Each of these bitstream stores can be traditional storage or S3 storage. This means that the potential storage of a
DSpace system is not bound by the maximum size of a single disk or file system and also that filesystem and S3storage can be combined in one DSpace
installation. Both filesystem and S3 storage are specified by configuration. Also see Configuring the Bitstream Store below.
Stores are numbered, starting with zero, then counting upwards. Each bitstream entry in the database has a store number, used to retrieve the bitstream
when required. An example of having multiple asset stores configured is that assetstore0 is /dspace/assetstore, when the filesystem gets nearly full, you
could then configure a second filesystem path assetstore1 at /data/assetstore1, later, if you wanted to use S3 for storage, assetstore2 could be s3://dspace
-assetstore-xyz. In this example various bitstreams (database objects) refer to different assetstore for where the files reside. It is typically simplest to just
have a single assetstore configured, and all assets reside in that one. If policy dictated, infrequently used masters could be moved to slower/cheaper disk,
where as access copies are on the fastest storage. This could be accomplished through migrating assets to different stores.
Bitstreams also have an 38-digit internal ID, different from the primary key ID of the bitstream table row. This is not visible or used outside of the bitstream
storage manager. It is used to determine the exact location (relative to the relevant store directory) that the bitstream is stored in traditional storage. The
first three pairs of digits are the directory path that the bitstream is stored under. The bitstream is stored in a file with the internal ID as the filename.
For example, a bitstream with the internal ID 12345678901234567890123456789012345678 is stored in the directory:
[dspace]/assetstore/12/34/56/12345678901234567890123456789012345678
Using a randomly-generated 38-digit number means that the 'number space' is less cluttered than simply using the primary keys, which are
allocated sequentially and are thus close together. This means that the bitstreams in the store are distributed around the directory structure,
improving access efficiency.
The internal ID is used as the filename partly to avoid requiring an extra lookup of the filename of the bitstream, and partly because bitstreams
may be received from a variety of operating systems. The original name of a bitstream may be an illegal UNIX filename.
617
When storing a bitstream, the BitstreamStorageService DOES set the following fields in the corresponding database table row:
bitstream_id
size
checksum
checksum_algorithm
internal_id
deleted
store_number
The remaining fields are the responsibility of the Bitstream content management API class.
The bitstream storage manager is fully transaction-safe. In order to implement transaction-safety, the following algorithm is used to store bitstreams:
1. A database connection is created, separately from the currently active connection in the current DSpace context.
2. An unique internal identifier (separate from the database primary key) is generated.
3. The bitstream DB table row is created using this new connection, with the deleted column set to true.
4. The new connection is _commit_ted, so the 'deleted' bitstream row is written to the database
5. The bitstream itself is stored in a file in the configured 'asset store directory', with a directory path and filename derived from the internal ID
6. The deleted flag in the bitstream row is set to false. This will occur (or not) as part of the current DSpace Context.
This means that should anything go wrong before, during or after the bitstream storage, only one of the following can be true:
Similarly, when a bitstream is deleted for some reason, its deleted flag is set to true as part of the overall transaction, and the corresponding file in storage
is not deleted.
Cleanup
The above techniques mean that the bitstream storage manager is transaction-safe. Over time, the bitstream database table and file store may contain a
number of 'deleted' bitstreams. The cleanup method of BitstreamStorageService goes through these deleted rows, and actually deletes them along with
any corresponding files left in the storage. It only removes 'deleted' bitstreams that are more than one hour old, just in case cleanup is happening in the
middle of a storage operation.
This cleanup can be invoked from the command line via the cleanup command, which can in turn be easily executed from a shell on the server machine
using [dspace]/bin/dspace cleanup. You might like to have this run regularly by cron, though since DSpace is read-lots, write-not-so-much it
doesn't need to be run very often.
# Clean up any deleted files from local storage on first of the month at 2:40am
40 2 1 * * [dspace]/bin/dspace cleanup > /dev/null 2>&1
Backup
The bitstreams (files) in traditional storage may be backed up very easily by simply 'tarring' or 'zipping' the [dspace]/assetstore/ directory (or
whichever directory is configured in dspace.cfg). Restoring is as simple as extracting the backed-up compressed file in the appropriate location.
It is important to note that since the bitstream storage manager holds the bitstreams in storage, and information about them in the database, that a
database backup and a backup of the files in the bitstream store must be made at the same time; the bitstream data in the database must correspond to
the stored files.
Of course, it isn't really ideal to 'freeze' the system while backing up to ensure that the database and files match up. Since DSpace uses the bitstream data
in the database as the authoritative record, it's best to back up the database before the files. This is because it's better to have a bitstream in storage but
not the database (effectively non-existent to DSpace) than a bitstream record in the database but not storage, since people would be able to find the
bitstream but not actually get the contents.
With DSpace 1.7 and above, there is also the option to backup both files and metadata via the AIP Backup and Restore feature.
While the old bitstore.xml file (defined at [dspace]/config/spring/api/bitstore.xml ) still exists in DSpace 7.4, a new configuration file has
been added for configuring the Bitstream Store: [dspace]/config/modules/assetstore.cfg
If you have previously configured bitstore.xml, and only have a single assetstore, we recommend resetting bitstore.xml to the default configuration, and
use the new assetstore.cfg file for configuring your Bitstream Store. This is of course very general advice, and your specific situation may require more
care, particularly if you have more than one Bitstream Store (see below, and this likely doesn't apply to you, but it could).
To configure traditional filesystem bitstore, as a specific directory, configure the bitstore like this:
#---------------------------------------------------------------#
#-----------------STORAGE CONFIGURATIONS------------------------#
#---------------------------------------------------------------#
# Configuration properties used by the bitstore.xml config file #
# #
#---------------------------------------------------------------#
This would configure store number 0 named localStore, which is a DSBitStore (filesystem), at the filesystem path of ${dspace.dir}/assetstore (i.
e. [dspace]/assetstore/)
It is also possible to use multiple local filesystems. The following example is specific to the older bitstore.xml configuration, and it should still work, but is
un-tested with DSpace 7.4. In the example below, key #0 is localStore at ${dspace.dir}/assetstore, and key #1 is localStore2 at /data
/assetstore2. Note that incoming is set to store "1", which in this case refers to localStore2. That means that any new files (bitstreams) uploaded to
DSpace will be stored in localStore2, but some existing bitstreams may still exist in localStore.
619
assetstore.index.primary = 1
#---------------------------------------------------------------#
#-------------- Amazon S3 Specific Configurations --------------#
#---------------------------------------------------------------#
# The below configurations are only used if the primary storename
# is set to 's3Store' or the 's3Store' is configured as a secondary store
# in your bitstore.xml
# Enables or disables the store initialization during startup, without initialization the store won't work.
# if changed to true, a lazy initialization will be tried on next store usage, be careful an excecption could
be thrown
assetstore.s3.enabled = true
# please don't use root credentials in production but rely on the aws credentials default
# discovery mechanism to configure them (ENV VAR, EC2 Iam role, etc.)
# The preferred approach for security reason is to use the IAM user credentials, but isn't always possible.
# More information about credentials here: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide
/credentials.html
# More information about IAM usage here: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-
roles.html
assetstore.s3.awsAccessKey = use-the-role-please
assetstore.s3.awsSecretKey = use-the-role-please
The incoming property specifies which assetstore receives incoming assets (i.e. when new files are uploaded, they will be stored in the "incoming"
assetstore). This defaults to store 0. NOTE: in the assetstore.cfg file, this setting is called assetstore.index.primary.
S3BitStore has parameters for awsAccessKey, awsSecretKey, bucketName, awsRegionName (optional), and subfolder (optional).
awsAccessKey and awsSecretKey are created from the Amazon AWS console. You'll want to create an IAM user, and generate a Security
Credential, which provides you the accessKey and secret. Since you need permission to use S3, you could give this IAM user a quick & dirty
policy of AmazonS3FullAccess (for all S3 buckets that you own), or for finer grain controls, you can assign an IAM user to have certain
permissions to certain resources, such as read/write to a specific subfolder within a specific s3 bucket.
bucketName is a globally unique name that distinguishes your S3 bucket. It has to be unique among all other S3 users in the world.
awsRegionName is a region in AWS where S3 will be stored. Default is US Eastern. Consider distance to primary users, and pricing when
choosing the region.
subfolder is a folder within the S3 bucket, where you could organize the assets to be in. If you wanted to re-use a bucket for multiple purposes
(bucketname/assets vs bucketname/backups) or DSpace instances (bucketname/XYZDSpace or bucketname/ABCDSpace or bucketname
/ABCDSpaceProduction).
Migrate BitStores
There is a command line migration tool to move all the assets within a bitstore, to another bitstore. bin/dspace bitstore-migrate
[dspace]/bin/dspace bitstore-migrate
usage: BitstoreMigrate
-a,--source <arg> Source assetstore store_number (to lose content). This is a number such as 0 or 1
-b,--destination <arg> Destination assetstore store_number (to gain content). This is a number such as 0 or
620
1.
-d,--delete Delete file from losing assetstore. (Default: Keep bitstream in old assetstore)
-h,--help Help
-p,--print Print out current assetstore information
-s,--size <arg> Batch commit size. (Default: 1, commit after each file transfer)
[dspace]/bin/dspace bitstore-migrate -p
store[0] == DSBitStore, which has 2 bitstreams.
store[1] == S3BitStore, which has 2 bitstreams.
Incoming assetstore is store[1]
[dspace]/bin/dspace bitstore-migrate -a 0 -b 1
[dspace]/bin/dspace bitstore-migrate -p
store[0] == DSBitStore, which has 0 bitstreams.
store[1] == S3BitStore, which has 4 bitstreams.
Incoming assetstore is store[1]
621
History
Changes in 7.x
Changes in Older Releases
622
Changes in 7.x
Changes in DSpace 7.6.3
Changes in DSpace 7.6.2
Changes in DSpace 7.6.1
Changes in DSpace 7.6
Changes in DSpace 7.5
Changes in DSpace 7.4
Changes in DSpace 7.3
Changes in DSpace 7.2
Changes in DSpace 7.1.1
Changes in DSpace 7.1
Changes in DSpace 7.0
623
Changes in DSpace 7.1
UI Changes: https://github.com/DSpace/dspace-angular/milestone/12?closed=1
REST API Changes: https://github.com/DSpace/DSpace/milestone/36?closed=1
624
Changes in Older Releases
All historical changes from older releases of DSpace may be found in the online documentation for those older releases.
625
Learning DSpace
The DSpace Community Advisory Team (DCAT) is developing this user-facing guide to DSpace 7. All are welcome to participate.
Pages
Community and Collection management
Collection Management
Create Collection
Delete Collection
Edit Collection
Export Collection
Community Management
Create a Community
Delete Community
Edit Community
Content (Item) management
Add item
Delete item
Edit Item
Authorizations (Manage access to an item)
Collection Mapper
Edit Bitstream
Edit Metadata
Edit Relationship
Make an Item Private
Move an Item
Reinstate an item
Versioned Item
Withdraw an item
Embargo an item
Lease an item
DSpace 7 Demo Quick Start
Management sidebar
Menus
Registry management
Metadata Registry Management
Request-a-copy
Search - Advanced
User management
Add or Manage an E-Person
Create or manage a user group
Videos
https://new.d2t.co/knowledge-center
User management
Registry management
Administrative search
Pages not yet created; please feel free to add, and start writing, these pages for topics requested by the DCAT DSpace Community Advisory Team:
For submitters
Add item
Embargo an item
626
Lease an item
Editing Guidelines
Crop images before loading.
Add a border around each image.
Add an image as "large" if text is small.
Avoid tables, if possible.
627
Community and Collection management
Documentation for repository managers.
Collection Management
Create Collection
Delete Collection
Edit Collection
Export Collection
Community Management
Create a Community
Delete Community
Edit Community
628
Collection Management
The collection is a level within a community or sub-community that holds items. This document provides an overview of creating, editing, and deleting a
collection.
The documentation below assumes that the user has the relevant authorizations. For example, the admin menu and edit buttons would appear to a user
having collection administration permission.
If you’re unsure about collection administration permissions assigned to your account for the target community, contact your system administrator.
Create Collection
Delete Collection
Edit Collection
Export Collection
629
Create Collection
Audience
Create Collection
Audience
Repository Administrator
Community Administrator
Collection Administrator
Create Collection
Step 1: Login using your credentials
Step 3: Click on “New” and click on “Collection” to proceed with the collection creation process.
630
Step 4: A pop-up window showing a list of communities will appear. Type the community’s name in the search field where you want to add this collection.
Upon typing a few letters of the community’s name, a list of communities having those letters or words will appear. Click on the community name for
initiating collection creation.
631
Step 5: A Create Collection form opens to populate information regarding the collection.
632
1. Collection logo – Click on the ‘browse’ link to select an image file to add as the collection’s logo. It is advisable to maintain
uniform dimensions of the logo across the repository.
2. Name – The collection’s name is a mandatory field and marked in ‘*.’
3. Introduction text (HTML) – Users can enter introductory text providing an overview of the contents stored in this collection.
One can utilize HTML tags to format the text or continue entering the plain text content.
4. Short Description – This field can have a one-line description of the collection that appears with the collection name on the
community homepage.
5. Copyright text (HTML) – Users can enter copyrights related information here. Fields marked with (HTML) support HTML
tags-based formatting.
6. New (HTML) – Enter news about this collection. Users can update this by going to this section via Edit Collection.
7. License – Add license-related information here.
8. Entity Type – Select Entity from the drop-down that must be uploaded in the collection.
9. Action Button – Users can click on the appropriate button as determined. Clicking on the Save button will add the collection
into the repository.
Step 6: Click on the ‘Save’ button to complete the Collection creation process. A success prompt will pop up upon collection creation, and the application
will automatically open the collection homepage.
633
Success prompt upon collection creation
Collection homepage
634
Delete Collection
Audience
Delete Collection
Audience
Repository Administrator
Community Administrator
Collection Administrator
Delete Collection
Step 1: Login using your credentials
Step 2: There are multiple ways to initiate the delete collection process. One of them is by going to the target collection using Admin options. Rollover your
cursor on the”Edit“ sign.
Step 3: Click on “Edit” and click on “Collection” to proceed with the edit collection process.
635
Step 4: A pop-up with the list of collections will appear. Type the Collection’s name in the search field you want to delete. A list of collections having typed
values will appear upon typing a few letters of the Collection’s title. Click on the Collection to continue with the deletion.
Step 5: The application will take the user to the edit collection form. To initiate the collection deletion, the user must click the ‘Delete this collection’ button.
Step 6: Click on the Confirm button to continue with the collection deletion or click on the Cancel button to return to the previous page.
636
Success prompt confirming the deletion will appear, and the DSpace homepage will open.
637
Edit Collection
Audience
Edit Collection
Edit Metadata
Template Item
Assign Roles
Content Source
Curate
Authorizations
Item Mapper
Manage mapped items
Map new items
Audience
Repository Administrator
Community Administrator
Collection Administrator
Edit Collection
Step 1: Login using your credentials
Step 2: There are multiple ways to initiate the Edit collection process. One of them is by going to the target collection using Admin options. Rollover your
cursor on the "Edit" sign.
Step 3: Click on “Edit” and click on “Collection” to proceed with the edit collection process.
638
Step 4: A pop-up window showing list of collections will appear. Type the collection’s name in the search field you want to edit. Upon typing a few letters of
the collection’s name, a list of collections with those letters or words will appear. Click on collection for initiating editing.
Step 5: The application will take the user to the edit collection form. The user can perform a range of actions to edit the collection. Each tab is explained in
a separate process in this document.
1. Delete this collection – The button provided for deleting the collection. Detailed steps are explained in the latter part of this
document.
2.
639
2. Tabs – Edit collection has a variety of activities involved, which are grouped in a logical manner across various tabs. Below
is the summary of these tabs
a. Edit Metadata – Tab covers activities related to editing Collection’s profile information
b. Assign Roles – This tab allows users to create specific roles for the collection
c. Content Source – This tab enables harvesting the contents from various sources using OAI standards
d. Curate – Users can set up various workflows related to content curation in this tab
e. Authorizations – Under this tab, users can manage various groups created for managing different access rights
and workflows specific to the collection
Edit Metadata
Step 6: Edit Metadata tab allows users to update the collection’s profile-related information, a.k.a. collection Metadata.
Various actions on this tab are explained immediately after the Edit Metadata illustration is added below.
1. Template Item – Users can add metadata elements and values during item submission in this collection. Item submitters can update or delete
these pre-populated values during the submission process.
2. Collection logo – Click on the delete button to remove the existing logo or add it if no logo exists.
3. Name – Update the existing collection name in this field.
4. Introduction text (HTML) – Update introductory text if already added or can add new text. One can utilize HTML tags to format the text or continue
entering the plain text content.
5. Short Description – Update the collection description or add a fresh short description for the collection.
6. Copyright text (HTML) – Update copyrights related information in this field. Fields marked with (HTML) support HTML tags-based formatting.
7.
640
7. New (HTML) – Add/Update news specific to this collection in the field.
8. License – Add/Update license-related information here.
9. Entity Type – Once added an entity to the collection, the value remains constant and uneditable.
10. Action Button – Clicking on the Save button will update the metadata information for the collection.
Template Item
Step 7: Click the ‘Edit’ button under the Template Item section to add metadata elements with pre-populated values for the item submission process.
Step 9: Users can start typing metadata elements as demonstrated below and select the appropriate component from the drop-down list.
Step 10: Enter the desired value in the Value field, and enter the ISO code of the language used. Then, click on add button for adding the template
metadata element.
641
Step 11: Users can click on the action buttons appearing to the right of the element added for updating or deleting the value-added in the element.
Step 12: Click on the Save button to finish the template edit process. A pop-up notification confirming successful updates of the metadata element will
appear, as demonstrated below.
Step 13: Click on the Save button appearing at the bottom of the Edit Metadata tab to save all updates. A success prompt will appear, and the collection
homepage will open.
642
Assign Roles
Step 14: This tab helps assign users to roles designed for the collection. These roles include administrative, maker-checker, and content consumption
activities. The description for each role is provided below the screenshot.
643
644
1. Administrators – The collection administrator can assign rights like item submission, edit item metadata, and map existing items from other
collections to this collection. Click the create button to create a dedicated Administrator group for the collection.
2. Submitters – Users or User groups part of this group can submit items to the collection. Click on the create button to add specific users and user
groups to perform item submission.
3. Default item read access - E-People and Groups can read new items submitted to this collection. Changes to this role are not retroactive. Existing
items in the system will still be viewable by those who had read access at the time of their addition. Click the restrict button to restrict default item
read access rights to a specific user group.
4. Default bitstream read access – E-People and Groups added in this section can read bitstreams (attachments) in items by default. Click the
restrict button to restrict default bitstream read access rights to a specific user group.
5. Editors - Editors can edit the metadata of submissions and then accept or reject them. Click on the create button to add the workflow step of
editing metadata and assigning roles to specific users or user groups.
6. Final editors - Final editors can edit the metadata of incoming submissions but can not reject them. Click the create button to add this workflow
step to the collection and assign a role to specific users or user groups.
7. Reviewers - Reviewers can accept or reject incoming submissions. However, they can not edit the metadata. Click the create button to add this
workflow step to the collection and assign a role to specific users or user groups.
Content Source
Step 16: This tab enables harvesting the content from external sources using OAI standards. Users can start harvesting by clicking the checkbox, “This
collection harvests its content from an external source.”
Step 17: Users will see various parameters related to OAI-based content harvesting upon clicking the checkbox as explained in the previous step. Below is
the explanation of elements appearing under Configure an external source header.
645
1. OAI Provider – Enter the source OAI provider’s URL.
2. OAI specific set id – Enter the set ID to source content.
3. Metadata Format – Select suitable metadata format using the dropdown list, e.g., Simple Dublin Core, Qualified Dublin Core, and DSpace
Intermediate metadata.
4. Harvest metadata only – Select this option to harvest only metadata from the source.
5. Harvest metadata and references to bitstreams (requires ORE support) – Click on this option to harvest metadata and reference links to
corresponding bitstreams.
6. Harvest metadata and bitstream (requires ORE support) – Use this option to harvest both metadata and corresponding bitstreams into the target
repository.
7. Click on the 'Save' button to update harvesting settings.
8. Upon clicking the save button and subject to successful validation of values entered, “Harvest Status” will turn to “Ready,” as demonstrated in the
screenshot below.
9. After successfully configuring an OAI profile, these buttons will get active, and the user can start harvesting data immediately.
10.
646
10. Users can click the “Test configuration” button to test settings and see a response message, as demonstrated on the screen below.
11. Upon successfully testing settings, click on the “Import now” button to harvest metadata immediately.
Curate
Step 18: The Curate tab provides various workflows for curating items stored in the collection. Below are standard flows, and there can be customized
curation workflows as well
Users must select a workflow from the dropdown list and click the “Start” button to initiate the curation process.
Authorizations
Step 19: Authorizations tab has all the policies defined for the collection. These are in addition to policies created from the “Assign Roles” tab. Key actions
available in this tab are explained below.
647
1. Manage Policies – Click the Add button or select policies from the table to create a new policy. Next, click on the Delete selected button for a
batch deletion of the policies.
2. Edit policy and members in a policy – Click the edit button to edit an individual policy or click on the group icon to edit the user group.
648
Step 21: Users can add information in the fields available in this form to the policy and save it by clicking the submit button. Please see the description of
each field followed by the below screenshot.
649
1. Name: Enter the Policy name in this field.
2. Description: Enter the Policy description here for future reference and understanding of other users.
3. Select the policy type: The user can select one of the following policy classification types from the list
a. TYPE_SUBMISSION: a policy in place during the submission
b. TYPE_WORKFLOW: a policy in place during the approval workflow
c. TYPE_INHERITED: a policy that has been inherited from a container (the collection)
d. TYPE_CUSTOM: a policy defined by the user during the submission or workflow phase
4. Select the action type: The user can select one of the following actions from the dropdown list:
a.
650
4.
a. READ
b. WRITE
c. REMOVE
d. ADMIN
e. DELETE
f. WITHDRAWN_READ
g. DEFAULT_BITSTREAM_READ
h. DEFAULT_ITEM_READ
5. Start date – end date: The user can select the start date and end date for using the policy, should they want to apply it for a fixed period.
6. The eperson or group that will be granted the permission: List of users/groups selected for granting permission under the policy
7. Search for an ePerson / Search for a group: Select ePerson or group for searching the entity
8. Search field: Enter keywords for searching the ePerson/Group
9. ePerson/Group list: Click on the select button against the user/group you want to add to the policy
10. Submit/Cancel button: Click on the Submit button to complete policy creation or click on the Cancel button to cancel the entire process.
You’ll see a confirmation prompt upon successfully creating the policy, as shown below. After that, the user will be back on the Authorizations screen.
Item Mapper
Manage mapped items
Step 22: The item mapper tab allows users to map items from other collections and manage mapped items.
Step 23: You’ll see items mapped with collections under the “Browse mapped items” tab. Click on the checkbox appearing with each item to select the item
(s) required to be unmapped.
651
Step 24: After selecting items required to be unmapped, please click on “Remove selected item mappings” to complete the operation. Click on the “Cancel”
button on the left of “Remove selected item mappings” to cancel the process.
652
Step 26: After confirming the non-existence of the target item in the existing mapped items list, please click on “Map new items.” Then, enter keywords
/keyphrases in the search field to search for target items.
You must know that you can enter keywords or keyphrases from any metadata field. Search field under “Map new items” work exactly like the basic search
field of DSpace.
Step 27: Users can select target items from the search results by clicking the checkbox appearing with items.
653
Step 28: After selecting target items, please click on the “Map selected items” button at the bottom of the page to complete the item mapping process.
Click the “Cancel” button to cancel the activity and return to the collection edit page.
654
Step 29: A prompt confirming successful mapping of items will appear upon completing the task, as demonstrated below.
Mapped items will appear in the collection and under the “Browse mapped items” tab, as demonstrated below.
655
656
Export Collection
DSpace provides a feature of exporting metadata of any collection into CSV format. Users can utilize this CSV file for multiple purposes like creating ad-
hoc reports, importing metadata into other systems, or for any use case as per its requirements.
Audience
Exporting a collection
Audience
1. Repository Administrator
2. Community Administrator
3. Collection Administrator
4. Basic user
Exporting a collection
Users log in using their log-in credentials and follow the steps mentioned below to export a collection’s metadata.
Step 1: Go to the DSpace home page and click on the “Log In” link at the top right corner of the screen, as illustrated below.
Step 2: Users will see the admin menu on the left-hand side of the screen, as highlighted in the illustration.
657
Step 3: Rollover your cursor over the Export menu and click on metadata.
Step 4: Type the collection’s name in the textbox and click on the target collection from the list appearing in the popup.
658
Step 5: Click on the “Export” button in the popup to continue with the item metadata-export or click the “Cancel” button to cancel the process.
Users will see the success prompt confirming the creation of the export process upon successful completion of the process, or else the application will
show the failure promptly.
659
Step 6: Users will be redirected to the metadata export page with a csv download link, as highlighted in the screenshot below. Click on the link to download
the file.
Click on the CSV file link to download the metadata CSV. This file contains metadata of items stored in the exported collection.
660
Click on the log file link to download. The Logfile contains details of steps performed during the export job.
661
Community Management
The community is the primary storage level in the DSpace’s storage hierarchy that holds sub-community and collections. This document provides an
overview of creating, editing, and deleting a community. The documentation below assumes that the user has the relevant authorizations. For example, the
admin menu and edit buttons would appear to a user having community administration permission.
If you’re unsure about community administration permissions assigned to your account for the target community, contact your system administrator.
Create a Community
Delete Community
Edit Community
662
Create a Community
Audience
Create Community
A Community is the primary storage level in the DSpace’s storage hierarchy that holds Sub-Communities and Collections. This document provides an
overview of creating, editing, and deleting a Community. The documentation below assumes that the user has the relevant authorizations. For example,
the admin menu and edit buttons would appear to a user having community administration permission.
If you’re unsure about community administration permissions assigned to your account for the target community, contact your system administrator.
Audience
1. Repository Administrator
2. Community Administrator
Create Community
Step 1: Login using your credentials
Step 3: Click on the “New” link and click on “Community” to proceed with the community creation.
663
Step 4: A popup providing the option to either create a Parent community or a sub-community will appear, with a list showing existing communities. Create
your new community by either:
Step 5: As per the user’s choice in the previous step, the application will open the create community or create a sub-community form to populate
information regarding the community’s profile. Below is the explanation of the information that needs to be populated on this form.
It is important to understand that both “Create Community” and “Create Sub-community” forms are identical. The critical difference between both is that the
“Create Community” form helps create a top-level community while the latter helps create a sub-community within a community or a sub-community.
The description provided below the following screenshot remains identical for both Community and Sub-community creation.
664
1. Community logo – Select the community’s logo by clicking on the ‘browse’ link to select an image file. It is advisable to maintain uniform
dimensions of the logo across the repository.
2. Name – The community’s name is a mandatory field marked with ‘*’.
3. Introductory text (HTML) – Users can add introductory text providing an overview of the contents stored in the community. One can utilize HTML
tags to format the text or continue entering plain text content.
4. Short Description – This field can have a one-line description of the community that displays with the community name in the list of communities
on the parent community page (or on the DSpace’s in the case of a top-level community.
5. Copyright text (HTML) – Users can enter copyright information here. Fields marked with (HTML) support HTML tag-based formatting.
6. News (HTML) – Enter news about this community. Users can update this by regularly going to this section via the editing community.
7. Action Buttons – Users can click on the appropriate button as determined. Clicking on the Save button will add the community to the repository.
Step 6: Click on the ‘Save’ button to complete the Community creation. A success prompt will pop up upon community creation, and the user will be re-
directed to the community homepage.
665
Success prompt upon community creation
Community homepage
666
Delete Community
Audience
Edit Community
Audience
1. Repository Administrator
2. Community Administrator
Edit Community
Step 1: Login using your credentials
Step 2: There are multiple ways to navigate to the controls to delete a community. One of them is by going to the target community and clicking on the
button with the pencil icon next to the community title ie the ‘Edit community’ button. Alternatively, follow the steps provided here.
Step 3: Click on “Edit” and click on “Community” to proceed with the edit community process.
667
Step 4: A popup showing a search box and a list of communities will appear. Type the name of the community you want to edit in the search field. Upon
typing a few letters of the community’s name, a list of the communities having those letters or words will appear. Click on the target community to initiate
editing.
Step 5: The application will take the user to the edit community form. To initiate the community deletion process, the user needs to click on the ‘Delete this
community’ button.
668
Step 6: Click on the Confirm button to continue with the community deletion or click on the Cancel button to return to the previous page.
Users will be re-directed to the homepage of DSpace upon successful completion of the community delete, and a popup confirming the community deletion
will appear.
669
Edit Community
Audience
Edit Community
Edit Metadata
Assign Roles
Curate
Authorizations
Audience
1. Repository Administrator
2. Community Administrator
Edit Community
Step 1: Login using your credentials
Step 2: There are multiple ways to start editing a community. One of them is by going to the target community and clicking on the Edit button, the button
with the pencil icon, beside the page title. Alternatively, follow the steps provided here.
Step 3: Click on “Edit” and click on “Community” to proceed with the edit community process.
670
Step 4: A popup showing a list of the communities in the DSpace and a search box will appear. If you are already on the page of the community, it will
appear at the top of the list, so you can select it by clicking on it. Otherwise, type the name of the community you want to edit in the search field. Upon
typing a few letters of the community’s name, a list of the community(ies) having those word(s) will appear. Click on the target community to initiate editing.
The application will take the user to the edit community form to perform various actions to edit the community. Each tab is explained in a separate process
in this document.
671
1. Delete this community – The button provided for deleting the community. Detailed steps are explained in the latter part of this page.
2. Tabs – Edit community has a variety of functions, which are grouped logically across various tabs. Below is the summary of these tabs
a. Edit Metadata – Tab covers activities related to editing the community’s profile information.
b. Assign Roles – This tab allows users to create specific roles for the community, usually, the role of Administrator of the community, see
further detail below.
c. Curate – Users can set up various workflows related to content curation in this tab
d. Authorizations – Under this tab, users can manage various groups and their different access rights in the community, for example, this
tab could be used to grant an individual the administrator role, see further detail below.
Edit Metadata
The Edit Metadata tab allows users to update the community’s profile-related information, a.k.a. community metadata.
Various actions on this tab are explained immediately after the Edit Metadata illustration is added below.
672
1. Community logo – Click on the delete button to remove the existing logo. If no logo exists, then a widget allowing the user to add a logo is
displayed here.
2. Name – Update the existing community’s name in this field.
3. Introduction text (HTML) – Update introductory text if already added or can add new text. One can utilize HTML tags to format the text or continue
entering plain text content.
4. Short Description – Update the description of the community or add a fresh short description for the community.
5. Copyright text (HTML) – Update copyright-related information in this field. This is usually displayed at the foot of the community landing page.
Fields marked with (HTML) support HTML tags-based formatting.
6. News (HTML) – Add/Update news specific to this community in the field. This is usually displayed with the heading ‘News’, underneath the
community’s introductory text, and above the list of collections and sub-communities.
7. Action Button – Clicking on the Save button will update the metadata information for the community.
Click on the ‘Save’ button to save the information updated in the ‘Edit Community’ tab. A success prompt will appear, confirming the successful edit of the
community.
Assign Roles
This tab allows authorized users to create a Community administrator role. Click on the “create” button to assign a community administrator role.
The roles available on this tab are explained below this illustration.
673
Administrators - Community administrators can create and manage sub-communities and collections. This user profile can also assign rights to edit item
metadata and map existing items from other collections.
Curate
This tab provides various workflows for curating items stored in the community. Below are standard flows, and there can be customized curation workflows
as well
Users must select a workflow from the dropdown list and click the “Start” button to initiate the curation process.
Authorizations
Users can view and edit community resource policies defined for the community, in the Authorizations tab. Users can create policies in addition to the
standard policies created from the Assign Roles tab. Following are the key actions in this tab.
1. Manage Policies
– Click on the Add button to create a new resource policy or select policies from the table, see further detail below.
- Alternatively, click on the Delete selected button for a batch deletion of the policies.
2. Edit policy and members in a policy – Click on the edit button to edit an individual policy or click on the group icon to edit the user group eg to add
or remove individual ePersons.
674
Click on Add button to create a new Authorization policy
Users can enter the information to create the policy and click on the submit button. Please see the description of each field followed by the below
screenshot.
675
676
1. Name: Enter the Policy name in this field.
2. Description: Enter the Policy description here for future reference and understanding of other users.
3. Select the policy type: The user can select one of the following policy classification types from the list
a. TYPE_SUBMISSION: a policy in place during the submission
b. TYPE_WORKFLOW: a policy in place during the approval workflow
c. TYPE_INHERITED: a policy that has been inherited from a container (the community)
d. TYPE_CUSTOM: a policy defined by the user during the submission or workflow phase
4. Select the action type: The user can select one of the following actions from the dropdown list. For example, select “READ” to assign read rights
to the user or user group.:
a. READ
b. WRITE
c. REMOVE
d. ADMIN
e. DELETE
f. WITHDRAWN_READ (disables item access)
g. DEFAULT_BITSTREAM_READ
h. DEFAULT_ITEM_READ
5. Start date – end date: The user can select the start date and end date of the period for which the policy will be active, should they want to apply
this policy for a fixed period only. If the start date is left blank, the policy comes into effect immediately.
6. The ePerson or group that will be granted the permission: List of users/groups selected for granting permission under the policy
7. Search for an ePerson / Search for a group: Select ePerson or group to add
8. Search field: Enter keywords for searching the ePerson/Group
9. ePerson/Group list: Click on the select button against the user/group you want to add to the policy
10. Submit/Cancel button: Click on the Submit button to complete policy creation or click on the Cancel button to cancel the entire process.
Upon successfully creating the policy, you’ll see a confirmation prompt, and the user will be back on the Authorizations screen.
677
Content (Item) management
Documentation for repository managers.
Add item
Delete item
Edit Item
Authorizations (Manage access to an item)
Collection Mapper
Edit Bitstream
Edit Metadata
Edit Relationship
Make an Item Private
Move an Item
Reinstate an item
Versioned Item
Withdraw an item
Embargo an item
Lease an item
678
Add item
Target Audience
Overview
Submission Form Highlights
Item Submission Process
Target Audience
Content Submitters
Overview
The item submission process lets authorized users deposit contents using metadata and bitstreams. It primarily consists of components.
Submission Form
Highlights
1. Bitstream upload section
2. Target Collection
5. Bitstreams Management
6. Deposit License
679
d. Deposit: Click this button to complete the
submission. The item will go to the next step as
per the workflow defined for the collection.
680
Item Submission Process
Step 1: Login using your credentials
681
Step 3: Click on “New” and click on “item” for proceeding further in the Item addition process
Step 4: A popup window with a collection list will appear. The user can select the target collection by typing its name or scrolling down the collection list.
Then, click on the collection to initiate item submission.
682
Step 5: Users will see the item submission form after selecting the target collection. The first step is to upload the attachment(s) in the item. In DSpace
terminology, an attachment is known as a “bitstream”.
Click on the “browse” link to upload attachment(s). Users can upload multiple files by selecting them together or dragging in the space.
A progress bar showing bitstream upload progress will appear, as demonstrated in the illustration below. In addition, after a successful bitstream upload, a
prompt confirming success or failure will appear.
683
Bitstream upload in progress
Step 6: After bitstream upload, the next step is to describe the item by adding metadata.
Metadata fields marked with “*” are mandatory, and users need to populate information in these fields to complete the submission mandatorily. A few
examples in the standard submission form are Author, Title, and Date of Issue.
684
Users will notice an alert mark at the top right of the “Describe” tab turning from Amber to green once mandatory fields have values. Below is an illustration
showing the state of the “Describe” section having values in all mandatory fields.
Step 7: The user can further update bitstreams by clicking on Buttons appearing next to the bitstream title.
685
Download: Click this button to download bitstream on a local machine.
Edit: Update bitstream details and access rights using this button. More explanation is provided below.
Delete: Clicking this button will delete the bitstream from the submission form.
Step 8: By clicking the edit button next to the bitstream, users can update bitstream information, as explained below.
Update the bitstream title and add descriptions to describe the attachment further. Please refer to the below illustration demonstrating both functions.
686
Users can define access conditions for the bitstream by selecting the appropriate option from the dropdown list. These options are:
Open Access: Select this option to make the bitstream available without any restriction.
687
Lease: This option is applicable when a user wants to keep bitstream accessible until a specific date in the future. The bitstream will not be
available as open-access content after the defined date under the “Grant access until” option.
Embargo: In contrast to a lease, an embargo allow the user to keep bitstream access restricted until a future date. This date is defined in the
“Grant access from” field. The bitstream will be available as open-access content to users after this date.
Administrator: Select this option if the bitstream’s access remains limited to administrators.
688
Step 9: Finally, users must click on the “I confirm the license above” checkbox to accept the deposit license and click on the “Deposit” button to complete
the item submission.
689
Delete item
Target Audience
Process Overview
Item Delete Process
Target Audience
Content Submitters
Community Administrators
System Administrators
Process Overview
As the name suggests, the Permanently Delete option is exercised when the authorized user(s) wants to permanently delete any item (Metadata +
bitstreams) from the repository.
Apart from permanently deleting an item, options like “Withdraw item from repository” and “Make item Private” can temporarily help disable the content
access.
Step 3: Click on the “Edit” button as highlighted on the screen below. This button will appear to the user having edit rights on the target item.
690
Step 4: Click the “Permanently Delete” Button to delete the item.
Step 5: Click on the Delete button on the confirmation screen. Should you want to continue with deletion, click on cancel to cancel the Permanent deletion
of the item from DSpace.
691
Edit Item
Audience
Edit Item Overview
Status tab
Bitstreams tab
Metadata tab
Relationships tab
Version History Tab
Audience
Content Submitters
Community Administrators
System Administrators
S
ta
tu
s
tab
692
B
it
st
re
a
m
s
tab
Edit metadata associated with the bitstreams, including filename, file format and file description
M
et
a
d
at
a
tab
Add or deletie metadata fields (i.e. elements added to Item) or edit values in existing fields. See further detail below.
R
el
at
io
n
s
hi
p
s
tab
Add or delete relationships with other items or editing existing relationship with items. See further details below.
693
Version
History
Tab
694
Authorizations (Manage access to an item)
Overview
Add Authorization Policy
Manage Policy
Delete Policy
Overview
Step 1: Login using your credentials
1. Search an item
2. Browse communities and collections
3. Finding an item in the Administration section at Edit > Item
Step 3: Click the "Authorizations" button under the "Status" tab to manage the Item's authorization policies.
695
Users can create different policies for both Item and bitstreams. These are:
Step 2: Users can populate information on the "Create new resource policy" page about the policy and click on the "submit" button. Please see the
description of each field followed by the below screenshot.
696
1. Name: Enter Policy name.
2. Description: Enter the Policy description for future reference and understanding of other users.
3. Select the policy type (Required): The user can select one of the following policy types:
a. TYPE_SUBMISSION: a policy in place during the submission
b. TYPE_WORKFLOW: a policy in place during the approval workflow
c. TYPE_INHERITED: a policy that has been inherited from a container (the collection)
d. TYPE_CUSTOM: a policy defined by the user during the submission or workflow phase
e.
697
3.
As shown below, users will see a success prompt upon the policy creation and be back on the Manage Policies screen.
Manage Policy
Step 1: Click on the Edit policy icon appearing against each policy to update it.
The user group button next to the Edit policy icon will take users to the user group management. Please refer concerned section for more details.
698
Step 2: Update policy information on the "Edit resource policy" page and click on the "Submit" button. Please see the description of each field appearing on
the "Edit resource policy" page after the screenshot.
As shown below, users will see a success prompt upon the policy creation and be back on the Manage Policies screen.
Delete Policy
Step 1: Click on the check box on the left-hand side of each policy, and the "Delete Selected" button will be activated.
700
Step 2: Click the "Delete selected" button to delete the policy. Please note that deleted policy is irrecoverable.
701
Collection Mapper
Manage mapped items to collections
Manage Mapped Collections
Map new collections
1. Search an item
2. Browse communities and collections
3. Finding an item in the Administration section at Edit > Item
702
Users can perform multiple functions in the Collection Mapper tab. These are:
703
The "Remove item's mapping for selected collections" button will activate upon selecting the collection.
Step 2: Click on “Remove item’s mapping for selected collections” to unmap selected collection(s) or click “Cancel” to cancel the operation.
704
A prompt confirming successful unmapping of the collection will appear, and the selected collection will disappear from the list in the “Browse mapped
collections” tab.
705
Step 2: Enter the name of the collection you want to map with this Item and click on the “Search” button.
Step 3: Click the checkbox on the left of target collections to select them. Click the “Map item to selected collections” button to complete mapping, or use
the “Cancel” button to cancel the operation.
706
A success prompt confirming collection mapping will appear. The selected collection will appear under the “Browse mapped collections” tab.
707
Edit Bitstream
Edit Bitstream Process
Add a Bitstream or Bundle in an item
Edit a Bitstream or Bundle in an item
Click on the “Edit” button appearing on the right-hand side of the item title.
708
Step 3: Click on the “Bitstreams” tab to edit the metadata.
Users can perform multiple functions in the bitstream tab. These are:
709
Step 2: Enter the bundle name or select existing names appearing in the dropdown list.
Step 3: Click on the “Create bundle” button to create a bundle or click on “Cancel” to cancel the operation.
Step 4: Drag and drop the bitstream(s) you want to attach to the bundle, or you can click on the “browse” link appearing in the file upload section.
Step 5: After a successful bitstream(s) upload, the user can add more details about the bitstream on the next screen. Click on the “Save” button at the
bottom of the page to save details, or click the “Cancel” button to discard updates.
1. Primary bitstream: Click on this button to set the bitstream as the primary bitstream for that bundle. If another bitstream was the primary bitstream
beforehand, it will be replaced with this one. The thumbnail for the primary bitstream of the ORIGINAL bundle will be used as the main thumbnail
for the item. (Note that the image media viewer currently doesn't take primary bitstreams into account, see https://github.com/DSpace/dspace-
angular/issues/2310)
2. The default value appearing in this field is the attachment’s filename. However, users can replace it with the value of their choice.
3.
710
3. Description: Users add a description of the attachment in this field.
4. Embargo until date: Users can select a future date to restrict public access to the attachment. Additionally, users can grant access to a specific
set of users by selecting user groups.
5. Selected format: If the file extension of the uploaded attachment exists in the DSpace’s bitstream registry, the user will see the registered value
for extension in this field. Users can change the value by selecting another one using the dropdown.
After clicking the “Save” button, the user will be redirected to the bitstream tab. A prompt confirming success or failure will appear.
711
Edit a Bitstream or Bundle in an item
Step 1: Click on a bitstream and drag it above or below another bitstream(s) in the bundle to change the bitstream’s sequence.
Step 2: Apart from adjusting the bitstream’s sequence in a bundle, below are other options available:
1. Download Bitstream: Click on this button to download the attachment on the local device.
2. Edit Bitstream: Click on the “Edit bitstream” button to edit details. More details are given in the following steps.
3. Delete Bitstream: Click on the Delete Bitstream button to delete bitstream from the bundle.
Step 3: Click on the Edit button shown on the screen above to edit the bitstream details. Below is the description of various fields appearing on this form
1. Primary bitstream: Click on this button to set the bitstream as the primary bitstream for that bundle. If another bitstream was the primary bitstream
beforehand, it will be replaced with this one. The thumbnail for the primary bitstream of the ORIGINAL bundle will be used as the main thumbnail
for the item. (Note that the image media viewer currently doesn't take primary bitstreams into account, see https://github.com/DSpace/dspace-
angular/issues/2310)
2. Filename: The default value appearing in this field is the attachment’s filename. Users can replace it with the value of their choice.
3.
712
3. Description: Users add a description of the attachment in this field.
4. Embargo until date: Users can select a future date to restrict public access to the attachment. Additionally, users can grant access to a specific
set of users by selecting user groups.
5. Selected format: If the file extension of the uploaded attachment exists in the DSpace’s bitstream registry, the user will see the registered value
for extension in this field. Users can change the value by selecting another one using the dropdown.
After clicking the “Save” button, the user will be redirected to the bitstream tab. A prompt confirming success or failure will appear.
Step 4: Click on the Delete button to delete a bitstream. The attachment you want to delete will be highlighted in the red background for confirmation.
Click on the “Save” button appearing below the bitstreams list to continue with the deletion. Otherwise, click on the “Discard” button to cancel the process.
713
714
Edit Metadata
Edit Metadata Process
Add a metadata field
Edit an existing metadata field
Delete an existing metadata field
Click on the “Edit” button appearing on the right-hand side of the item title.
715
Step 3: Click on the “Metadata” tab to edit the metadata.
Step 4: Users can perform multiple actions in the Edit Metadata section, which are listed after the screenshot.
1. Add: Button used for adding new metadata elements in the existing Item.
2. Metadata fields: The fields column shows the metadata element’s value appearing in the “Value” column.
3. Edit: This panel contains various options to update the specific metadata field. They are
a. Edit value – Users click on this button to edit the existing metadata value
b. Delete metadata field – Click on this button for deleting the metadata field from the Item
c. Undo changes – Click on this button for undo changes made in the metadata field
716
Step 2: Upon typing a few characters of the metadata element, users will notice a drop-down list showing metadata elements matching the entered value.
Users can select the appropriate metadata element from the drop-down list.
Step 3: After selecting the required metadata element, enter the metadata value in the “Value” field and the ISO code of the language under the Lang field.
For example, enter "en" for English.
717
Step 4: Click on the “Complete” button as highlighted below to update the field.
Step 5: Click on the “Save” button to continue saving changes or “Discard” to cancel changes made in the metadata fields. A success prompt confirming
Metadata updates will appear as shown below.
718
Edit an existing metadata field
Step 1: Click on the “Edit” button to edit an existing metadata field.
Step 2: The metadata field becomes editable after clicking the “Edit” button to edit the metadata field element, value, and language.
719
Step 3: Click on the “Complete” button as highlighted in the screenshot below to finish the update.
The metadata field will highlight the successful addition of the metadata, as shown in the screenshot below. Click on the undo button if you want to undo
the addition of the metadata field.
720
Step 4: Click on the “Save” button to continue saving changes or “Discard” to cancel changes made in the metadata fields. A success prompt confirming
Metadata updates will appear as shown below.
721
Step 2: The deleted metadata field will be highlighted in red.
Step 3: Click on the “Save” button to continue saving changes or “Discard” to cancel changes made in the metadata fields. A success prompt confirming
Metadata updates will appear as shown below.
722
723
Edit Relationship
Relationship Management
Add Relationships with other items
Delete a Relationship
Relationship Management
Entity-relationship is a new concept introduced in the DSpace version 7, helping authorized users logically link two or more items by defining relationships
among them. A few good examples are the Relationship between an article & Author, a Journal and Journal Article, Organization Unit and Individuals in
the unit, etc.
Users can reach an item through multiple methods, which are listed below
Click on the “Edit” button appearing towards the right-hand side of the item title.
724
Step 3: Click on the “Relationships” tab.
Step 2: Identify target items for addition using Filters and Search functions. Please note that process is identical to the advanced search process.
725
Step 3: Click checkboxes appearing on the left-hand side of each Item to select them for the Relationship.
Step 4: Scroll down to the bottom of the screen and click on the “Save” button to complete the process.
726
The application will redirect users to the source item’s relationship tab, highlighting newly added items for easy identification.
Step 5: Click on the “Save” button to confirm the item addition under the selected relationship type or click “Discard” to undo the addition.
727
Users will see a prompt confirming the successful mapping of items under the selected relationship type.
Delete a Relationship
Step 1: Click on the “delete” button appearing a bitstream and drag it above or below another bitstream(s) in the bundle to change the bitstream’s
sequence.
Step 2: A prompt showing items concerning the selected Item will appear in the prompt. Click the checkbox appearing next to items. Please refer
screenshots below for a better understanding.
728
List of items concerning the selected Item.
Step 3: Click the “Save” button to complete the selection process and go back to the Relationship tab of the source item.
729
Step 4: Click on the “Save” button to confirm the deletion under the selected relationship type or click “Discard” to undo the action. Users will see a prompt
confirming the successful deletion of items under the selected relationship type.
730
Make an Item Private
Audience
Make Item private
Audience
Content Submitters
Community Administrators
System Administrators
1. Search an item
2. Browse communities and collections
3. Finding an item in the Administration section at Edit > Item
Click on the “Edit” button appearing towards the right-hand side of the item title.
731
Step 3: Click on the “Make it private” button under the “Status” tab to make the selected Item private.
Step 4: Click on the “Make it Private” button to make the selected Item private or click the “Cancel” button to cancel the operation.
732
Step 5: You will see a success prompt confirming that the Item is private, as shown below.
733
Step 6: You will notice that the Item will appear with a “Private” tag.
734
Move an Item
Audience
Move an item
Audience
Content Submitters
Community Administrators
System Administrators
Move an item
Step 1: Login using your credentials
1. Search an item
2. Browse communities and collections
3. Finding an item in the Administration section at Edit > Item
Click on the “Edit” button appearing on the right-hand side of the item title.
735
Step 3: Click on the “Status” tab and click the “Move” button.
The field for entering the collection name: Enter the target collection name to move the item or select the collection from the drop-down list, as
demonstrated in the following step.
1. Inherit policies: Click on this check box to update the item’s policies according to the collection’s policies.
2. Move: Click the “Move” button to complete the operation.
3. Cancel: Click the “Cancel” button to cancel the operation.
736
Step 5: Click on the Collection name and type the target collection name to move the item or scroll the collection list to identify the appropriate collection.
Step 6: Click on the “Move” button after selecting the target collection.
737
Step 7: The item will move to the target collection upon completing the operation.
738
Reinstate an item
Step 1: Login using the DSpace credentials
Users can reach an item through multiple methods, which are listed below
Click on the “Edit” button on the right-hand side of the item title.
Step 3: Click the “Reinstate” button under the “Status” tab to reinstate the item into the archive.
739
Step 4: Click the “Reinstate” button to reinstate the item or click the “Cancel” button to cancel the operation.
740
Step 5: Users will see a success prompt confirming the item reinstate, as shown below.
741
Step 6: Users will notice that the “Withdrawn” tag appearing earlier on top of the item does not appear anymore.
742
Versioned Item
Audience
Create a version
Few important facts
Access Item’s versions
Additional information for the version creator
DSpace provides version creation and version management functionality. This functionality enables authorized users to create multiple versions of an item
to manage changes in its metadata and attachment while keeping track of differences between two versions. Users can also roll back to the previous
version.
Audience
Content Submitters
Community Administrators
System Administrators
Create a version
Step 1: Login using the DSpace credentials
Step 2: Users can reach an item to create a version through various methods, which are listed below
Step 3: Users will see the “Create Version” button on the item detail page highlighted below. Click it to create a new version of an item.
743
Step 4: After clicking the “Create Version” button, users will see a prompt seeking a summary of the new version. Please enter a summary of changes
users will make in the latest version.
Later, this summary plays an essential role in tracking changes made in the version that helps the broader user group and auditors.
Step 5: Users will see a success prompt confirming a new version creation, as shown below. A page similar to the item submission process will appear
with the item’s existing metadata and attachments in an editable mode.
Step 6: Users can update required metadata and attachments on this page the same way they would have done during the item submission process.
744
Few important facts
Users can update and add new metadata during the version update process
Like the metadata, one can also update attachments by updating/removing existing attachments and adding new ones.
It’s possible to assign a new collection to the latest version. However, it does not change the storage location of the old
version.
Suppose the collection where the latest version needs to be stored has approval workflows assigned. The newest version
will be published after necessary approvals.
Users can save the draft version during updates and pick it up from their workspace to complete later.
745
746
Step 7: Click on the “Deposit” button to complete the version creation process. Apart from clicking the “Deposit” button, users can perform the following
actions during the version creation:
As briefed above, if the target collection has the approval workflow assigned to it, then the latest version will appear to users having an approval role for
acceptance. However, if no workflow is set to the collection, the new version will be published for public access.
Step 2: Users can scroll down the item details page to see its version history, as illustrated below. Version history table shows the following details:
Version: The version number of the item. The illustration shows that the selected version has * next to it.
Date: Version creation date and time as per the server.
Summary: Summary added by the user during version creation.
747
Step 3: The item details page shows information from the latest version. Users can click on the previous version id to see it.
The version in the approval workflow: If a version is unpublished due to pending approval, then the “Workflow Item” tag
will appear next to such versions. These versions are not visible to all users.
Alert about the latest version: An alert confirms the page is not the newest version, and a link to the newest version
appears at the top of the page.
748
749
Withdraw an item
Step 1: Login using the DSpace credentials
Users can reach an item through multiple methods, which are listed below
Click on the “Edit” button on the right-hand side of the item title.
Step 3: Click the “Withdraw” button under the “Status” tab to withdraw the item from the archive.
750
Step 4: Click on the “Withdraw” button to withdraw the item or click the “Cancel” button to cancel the operation.
751
Step 5: Users will see a success prompt confirming the item withdrawal, as shown below.
752
Step 6: Users will notice that the item will appear with a “Withdrawn” tag.
753
Embargo an item
“Embargo an item” helps restrict the Item’s attachment’s access until a future date. A user can embargo an item while submitting it or later by editing it.
Both methods to embargo an item are explained below.
Audience
Embargo an item during the item submission
Embargo an item via edit item
Audience
1. Repository Administrator
2. Community Administrator
3. Collection Administrator
4. Item Administrator/submitter
754
Step 3: Click on “New” and click on “item” to proceed further in the Item addition process
Step 4: A popup window with a collection list will appear. The user can select the target collection by typing its name or scrolling down the collection list.
Then, click on the collection to initiate item submission.
Step 5: Users will see the item submission form after selecting the target collection. The first step is to upload attachment(s) in the Item. In DSpace
terminology, an attachment is known as a “bitstream.”
Click on the “browse” link to upload attachment(s). Users can upload multiple files by selecting them together or dragging them into the space.
755
A progress bar showing bitstream upload progress will appear, as demonstrated in the illustration below. In addition, after a successful bitstream upload, a
prompt confirming success or failure will appear.
756
Bitstream Upload Successful
Step 6: After uploading bitstream, the next step is to describe the Item by adding metadata.
Please refer Add Item process for detailed documentation on populating information in the metadata fields.
Step 7: Click on the edit button against any attachment to add embargo policy.
Step 8: Users can apply multiple policies on an attachment, and there are various available options. Click on the dropdown list under the “Access
condition” type and select embargo, as highlighted in the screenshot below.
After selecting embargo in the dropdown list, the “Grant access from” date field will be activated. Next, users can choose the future date, after which the
attachment should be accessible to the larger set of DSpace users.
757
Users can add multiple policies to the attachment by clicking the “Add more” link. For example, a user can define an embargo on an item until a future
date. Likewise, a lease policy can keep the attachment open access until another date in the future.
Step 9: After updating all information, the submitter clicks on the “I confirm the license above” checkbox to accept the repository’s license.
758
Step 10: Click on the “Deposit” button to submit the Item in DSpace. Users will get to see a confirmation prompt upon successful submission of the Item.
1. Search an item
2. Browse communities and collections
3. Finding an item in the Administration section at Edit > Item
Click on the “Edit” button appearing on the right-hand side of the item title.
Step 3: Click on the “Authorizations” button under the “Status” tab to continue with adding the embargo policy.
759
Step 4: The user will see multiple options against each attachment as explained below:
1. Download Bitstream: Click on this button to download the attachment on your local device for view.
2. Edit Bitstream: Click on the “Edit bitstream” button for editing details. Explained in the next step
3. Delete Bitstream: Click on the “Delete Bitstream” button to delete bitstream from the bundle.
Step 5: Click on “Edit bitstream’s Policies” to continue with the embargo process.
760
Step 6: Click on the “Add” button to create the custom embargo policy for the attachment.
Step 7: Enter details for creating the embargo policy on this form and perform the following actions:
761
Detailed documentation on various possibilities on this form is available under ‘Edit Bitstream’ user documentation.
Users will see a success prompt upon creating the policy and will be redirected to the bitstream policy page.
762
Lease an item
“Lease an item” helps restrict the Item’s attachment’s access after a future date. A user can lease an item during the submission or later by editing it. Both
methods to lease an item are explained below.
Audience
Lease an item during the item submission
Lease an item via edit item
Audience
1. Repository Administrator
2. Community Administrator
3. Collection Administrator
4. Item Administrator/submitter
763
Step 3: Click on “New” and click on “item” to proceed further in the Item addition process
Step 4: A popup window with a collection list will appear. The user can select the target collection by typing its name or scrolling down the collection list.
Then, click on the collection to initiate item submission.
Step 5: Users will see the item submission form after selecting the target collection. The first step is to upload attachment(s) in the Item. In DSpace
terminology, an attachment is known as a “bitstream.”
Click on the “browse” link to upload attachment(s). Users can upload multiple files by selecting them together or dragging them into the space.
764
A progress bar showing bitstream upload progress will appear, as demonstrated in the illustration below. In addition, after a successful bitstream upload, a
prompt confirming success or failure will appear.
765
Bitstream Upload Successful
Step 6: After uploading bitstream, the next step is to describe the Item by adding metadata.
Please refer Add Item process for detailed documentation on populating information in the metadata fields.
Step 7: Click on the “edit” button against any attachment to lease it.
Step 8: Users can apply multiple policies on an attachment from available options. Click on the dropdown list under the “Access condition” and select
lease, as highlighted in the screenshot below.
After selecting the lease, the “Grant access until” date field will be activated. Next, choose the future date, after which the attachment should be restricted
to the larger set of DSpace users.
766
Users can add multiple policies to the attachment by clicking the “Add more” link. For example, a user can define a lease on an item until a future date.
And after that date, another policy can be defined for the next period.
Step 9: After updating all information, the submitter clicks on the “I confirm the license above” checkbox to accept the repository’s license.
767
Step 10: Click on the “Deposit” button to submit the Item in DSpace. Users will get to see a confirmation prompt upon successful submission of the Item.
1. Search an item
2. Browse communities and collections
3. Finding an item in the Administration section at Edit > Item
4. Click the “Edit” button on the right side of the item title.
768
Step 3: Click on the “Authorizations” button under the “Status” tab to continue with leasing attachment(s).
Step 4: The user will see multiple options against each attachment as explained below:
1. Download Bitstream: Click on this button to download the attachment on your local device for view.
2. Edit Bitstream: Click on the “Edit bitstream” button for editing details. Explained in the next step
3. Delete Bitstream: Click on the “Delete Bitstream” button to delete bitstream from the bundle.
769
Step 5: Click on “Edit bitstream’s Policies” to continue the leasing process.
Step 6: Click on the “Add” button to create the custom lease policy for the attachment.
770
Step 7: Enter details for creating the lease policy on this form and perform the following actions:
771
Detailed documentation on various possibilities on this form is available under ‘Edit Bitstream’ user documentation.
Users will see a success prompt upon creating the policy and be redirected to the bitstream policy page.
772
DSpace 7 Demo Quick Start
Front end
https://demo.dspace.org
OAI-PMH
https://demo.dspace.org/server/oai/request?verb=Identify
REST-API
https://demo.dspace.org/server/
SWORD v1
https://demo.dspace.org/server/sword/servicedocument
SWORD v2
https://demo.dspace.org/server/swordv2/servicedocument
Media viewer (video and image), IIIF Mirador player are disabled on Demo
https://demo6.dspace.org/
https://demo6.dspace.org/xmlui/
OAI-PMH
https://demo6.dspace.org/oai/request?verb=Identify
REST-API
https://demo6.dspace.org/rest
SWORD v1
https://demo6.dspace.org/xmlui/sword/servicedocument
SWORD v2
https://demo6.dspace.org/xmlui/swordv2/servicedocument
773
Management sidebar
Many of the administrative functions can be accessed from the Management sidebar. This list maps the menu to more detailed information.
New
Community
Collection
Add item
Process
Edit
Community
Collection
Item
Import
Metadata
Export
Metadata
Access Control
People
Groups
Admin Search
Registries
Metadata
Format
Curation Task
Processes
Administer Workflow
Health
New
Quickly create or edit objects from anywhere in the system. Either browse to the object first, or search for it using the Admin sidebar. - Release
Notes
Community
Collection
Add item
Process
Processes UI (video) allows Administrators to run backend scripts/processes while monitoring their progress & completion. - Release
Notes
See Command Line Operations for more detail about these commands.
Edit
Quickly create or edit objects from anywhere in the system. Either browse to the object first, or search for it using the Admin sidebar.
Bitstream Editing (video) has a drag-and-drop interface for re-ordering bitstreams and makes adding and editing bitstreams more intuitive.
Metadata Editing (video) introduces suggest-as-you-type for field name selection of new metadata. - Release Notes
Community
Edit
Delete
Collection
Edit
Delete
Item
Edit
Delete
Import
774
You can drop or browse CSV files that contain batch metadata operations on files. When "Validate Only" selected, the uploaded CSV will be
validated. You will receive a report of detected changes, but no changes will be saved.
Metadata
Export
Metadata
Access Control
Login As (Impersonate) another account allows Administrators to debug issues that a specific user is seeing, or do some work on behalf of that
user. (Login as an Admin, Click "Access Control" in sidebar, Click "People". Search for the user account & edit it. Click the "Impersonate
EPerson" button. You will be authenticated as that user until you click "Stop Impersonating EPerson" in the upper right.) - Release Notes
People
Groups
Admin Search
Administrative Search (video) combines retrieval of withdrawn items and private items, together with a series of quick action buttons. - Release
Notes
Registries
Metadata
Format
Curation Task
Processes
Processes UI (video) allows Administrators to run backend scripts/processes while monitoring their progress & completion. - Release Notes
Details about each of the available processes/scripts can be found in the "scripts" directory of the REST API docs: https://github.com/DSpace
/RestContract/blob/main/script
Additional information can also be found in the Command Line Operations documentation.
Administer Workflow
Administer Active Workflows (video) allows Administrators to see every submission that is currently in the workflow approval process. - Release
Notes
Health
Admin "Health" menu provides basic control panel functionality (based on 6.x Control Panel). When logged in as an Administrator, select
"Health" from the side menu. You'll see a "Status" tab which provides useful information about the status of the DSpace backend, and an "Info"
tab which provides an overview of backend configurations and Java information. - Release Notes
775
Menus
Communities & Collections
All of DSpace
By Issue Date
By Author
By Title
By Subject
Statistics
Search box
Language
Log In
Profile
MyDSpace
All of DSpace
Browse by Issue Date, Author, Title, or Subject. Use the gear icon to set the sorting and number of results per page.
By Issue Date
By Author
By Title
By Subject
Statistics
(Once logged in with correct access)
On the Home page, this lists the 10 items with most visits to the item page. (This does not count the downloads of the files contained in the item.)
On a Community page, this lists the total cumulative visits to the community's main page (not all the collections and items that community
contains), visits to the community main page broken out by month for the past 7 months, and by top cumulative country and city views.
On a Collection page, this lists the total cumulative visits to the collection's main page (not all the items that collection contains), visits to the
collection page broken out by month for the past 7 months, and by top cumulative country and city views.
On an Item page, this lists the total cumulative visits to the item page, (This does not count the downloads of the files contained in the item.),
visits to that page broken out by month for the past 7 months, and by top cumulative country and city views.
Search box
Use this search box to retrieve search results that can be further refined by search filters, such as date, subject, author, and item type.
776
See Search - Advanced for tips for creating Boolean searches.
Language
Choose your preferred supported language for the DSpace system. Repository files and metadata will be in their source language.
Log In
(Once logged in)
Profile
If enabled in the repository, one can set up a Researcher Profile.
One can reset their password and view your group memberships.
MyDSpace
View or edit your submissions:
777
Registry management
Info for repository managers, covering topics such as metadata registry management and format registry management.
778
Metadata Registry Management
The metadata registry maintains a list of various metadata schemas in DSpace. These metadata schemas consist of different metadata elements.
However, DSpace requires the qualified Dublin Core schema.
Audience
Create Metadata Registry
Metadata Registry Management
Delete Metadata Schema
Audience
1. Repository Administrator
2. Community Administrator
Step 1: Click on the “Log In” link appearing at the top right corner of the DSpace home page, and the pop-up will open, as illustrated in the below screen.
Step 2: Enter your user id and password and click on the login button for logging in to DSpace.
779
Step 3: Users with administrative rights will see the admin menu on the left-hand side of the screen, as shown in the below illustration.
Step 4: Roll over your cursor over the administration menu and click on Registries. Click on Metadata to go to the Metadata Registries.
780
Step 5: Enter the value of your choice in the ‘Namespace’ field and the short value in the ‘Name’ field.
Step 6: Click on the Save button to create a Metadata schema. A success prompt will appear upon metadata schema creation in the DSpace.
781
Step 7: Scroll down the list to see the schema added. Click on the namespace or name value to access the metadata schema.
1. Element is the primary value in the metadata schema. For example, suppose the date is required to be added as a metadata element. In that
case, the element’s value can be the date.
2. Qualifier – Use the Qualifier to segregate the metadata element further. For example, suppose you want to add multiple dates under an element.
In that case, users can add various qualifiers for each date type. For example, there can be a content listing date, content de-listing date, and
other types of dates.
3. Scope Note – Users can enter the definition of the metadata element created to benefit other users.
782
Step 9: Click on the Save button to add an element to the selected metadata schema. You will see a confirmation prompt upon the successful addition of
the element.
Step 1: Go to the home page of DSpace and click on the “Log In” link appearing at the top right corner of the screen, and the pop-up will open, as
illustrated in the below screen.
783
Step 2: Enter your user id and password and click on the login button for logging in to DSpace.
Step 3: Users with administrative rights will see the admin menu on the left-hand side of the screen, as shown in the below illustration.
784
Step 4: Roll over your cursor over the administration menu and click on Registries. Click on Metadata to go to the Metadata Registries.
Step 5: Click on the namespace or name of the Metadata schema you want to edit.
785
Step 6: Click on the metadata element you want to edit. Upon clicking the target element, its values will be populated in the corresponding fields under the
Edit Metadata fields section, as shown below.
Step 7: Update the value under the target field(s) and click on the ‘Save’ button to save them. A success prompt will appear upon successful update. In
addition, an updated metadata element will appear in the metadata schema.
786
Delete Metadata Schema
Step 1: Go to the home page of DSpace and click on the “Log In” link appearing at the top right corner of the screen, and the pop-up will open, as
illustrated in the below screen.
Step 2: Enter your user id and password and click on the login button for logging in to DSpace.
787
Step 3: Users with administrative rights will see the admin menu on the left-hand side of the screen, as shown in the below illustration.
Step 4: Roll over your cursor over the administration menu and click on Registries. Click on Metadata to go to the Metadata Registries.
788
Step 5: Click on the namespace or name of the Metadata schema under which you want to delete an element.
Step 6: Click on the checkbox appearing on the left of the metadata element you want to delete.
789
Step 7: Click on the ‘Delete selected’ button appearing in red at the bottom of the page. Upon successful deletion of the element, you will see a prompt
confirming deletion of the field.
Step 8: To delete the entire metadata schema, click on the checkbox appearing on the left of the target metadata schema(s).
790
Step 9: Click on the ‘Delete selected’ button appearing in red at the bottom of the page. Upon successful deletion of the element, you will see a prompt
confirming deletion of the field.
791
Request-a-copy
Scope
Covering the use of the request-a-copy feature which allows users to request bitstreams which are under embargo. This information does not cover how to
activate or de-activate the feature.
Use Case
The repository manager wants to be able to advise their depositors on the existence of the feature, that they should be prepared to receive requests.
Audience
Repository managers, anonymous users, submitters
Feature description
If the file(s) (ie bitstreams) in a deposit are under embargo, they will not be available for users to download directly. However, users can send the depositor
a request for the file(s) through DSpace, by double-clicking on a filename hyperlink and completing the form. DSpace sends the details of the request to
the depositor by email. The depositor is offered the option to agree or decline to the request. If the depositor then uses the link in the email and the form it
opens in DSpace to transmit their agreement to the request, the system will attempt to email the file(s) to the user.
Known limitation
Of course some files are too large for many email servers to cope with. For example files over 150 MB have been impossible for some systems to send.
792
Search - Advanced
Advanced Search (including boolean options) is already supported in the DSpace 7 search page. Boolean keywords can be used, and you can also
specify to search within specific fields by name. Some examples:
Basic searching: Searching test power will return results with both these words in them (this is equivalent to an "AND" boolean search). E.g. htt
ps://demo.dspace.org/search?query=test%20power
Boolean searching options
Searching test AND power will return results with both these words in them. E.g. https://demo.dspace.org/search?query=test%
20AND%20power
Searching test OR power will return results with either of these words in them. E.g. https://demo.dspace.org/search?query=test%
20OR%20power
Searching test NOT power will return results with "test" but not including "power". E.g. https://demo.dspace.org/search?query=test%
20NOT%20power
Phrase searching: Searching "test power" (in quotes) will return results with the exact phrase "test power" in them. E.g. https://demo.dspace.
org/search?query=%22test%20power%22
Searching within specific fields
Searching dc.title:test will only return results where the dc.title includes "test". E.g. https://demo.dspace.org/search?query=dc.
title:test
Searching dc.title:test AND dc.description.abstract:green will only return results where the dc.title field includes
"test" and the dc.description.abstract field returns "green". E.g. https://demo.dspace.org/search?query=dc.title:test%20and%
20dc.description.abstract:green
Searching dc.subject:fin* will only return results where one (or more) dc.subject fields start with "fin" (e.g. finance, financial, finish,
etc), e.g. https://demo.dspace.org/search?query=dc.subject:fin*
Wildcard searching:
Searching test pow* will return results including "test" and any word starting with "pow". E.g. https://demo.dspace.org/search?
query=test%20pow*
Searching dc.description.abstract:* will return results that include the "dc.description.abstract" metadata field (with any value in
it). E.g. https://demo.dspace.org/search?query=dc.description:*
Range searching:
Searching dc.date.issued:[1999 TO 2003] will return results that have a "dc.date.issued" metadata field that has a date between
1999 and 2003 (inclusive). E.g. https://demo.dspace.org/search?query=dc.date.issued:%5B1999%20TO%202003%5D
Searching dc.date.issued:[2010 TO *] will return results that have a "dc.date.issued" metadata field that has a date after (or
including) 2010 . E.g. https://demo.dspace.org/search?query=dc.date.issued:%5B2010%20TO%20*%5D
Special characters: Some characters have special meaning in searches, e.g. colon (:), asterisk (*), boolean operators, etc. If you need to search
for these characters exactly, surround them with double quotes.
Searching "test:power" will search for that string exactly (including the colon character). (NOTE: Without the quotes, DSpace would
attempt to perform "Searching within specific fields" (see above) as the colon is a special character.)
DSpace supports all Solr search syntax options, as all searches in DSpace are sent directly to Solr. For more examples, see the Solr documentation for
the "Specifying Terms for the Standard Query Parser".
793
User management
Documentation for repository managers.
794
Add or Manage an E-Person
An E Person is a user in DSpace who can be assigned various rights to perform activities or manage content access in the repository.
This section provides details about various methods to create or update an E Person in DSpace. For example, DSpace allows users to self-register. In
addition, users with administrative rights can create and update E Person in the system. Both methods are explained in the latter part of this document.
Audience
Add E Person – Self Registration
Add E Person – Registration by Administrator
Update an Eperson
Update an Eperson – Update details & Manage Log in
Update an Eperson – Manage user groups membership
Update an Eperson – Delete Eperson
Audience
1. Repository Administrator
2. Community Administrator
3. Collection Administrator
4. Base User
795
Step 3: Enter the user’s Email ID who needs to be registered as E Person in DSpace and click on the Register button.
Step 4: After clicking the register button, an email will be sent to the user’s email id, and the user will be redirected to the home page. This E Mail will go
from the communication mail ID registered in the DSpace instance.
796
Step 5: Click on the unique registration link received in the email to continue with the registration process. Suppose your email client or server security
settings do not show hyperlinks. In that case, you can copy the link and paste it into your browser.
Step 6: Enter a password of your choice and re-enter the same password. Followed by this, click on the “Submit Password” button to complete the
registration process.
Step 1: Go to DSpace’s home page and click on the “Log In” link appearing at the top right corner of the screen, and the pop-up will open, as illustrated
below screen.
797
Step 2: Enter your user id and password and click on the login button for logging in to DSpace
Step 3: Users with administrative rights will see the admin menu on the left-hand side of the screen, as shown below illustration.
798
Step 4: Rollover your cursor over the administration menu and click on Access control. Click on the People link to go to the EPeople module.
Step 5: Click on the “Add EPerson” button for initiating the E-Person creation process.
799
Step 6: Enter “First Name”, “Last Name,” and “E-Mail”. Select the “Can log in” check box for enabling id to login into the DSpace.
Step 7: Upon populating values in mandatory fields “Save” button will get activated. Click on the Save button to complete the process.
Step 8: Upon successfully adding EPerson in DSpace, you will see a success prompt on the screen.
800
Step 9: The administrator can inform the user to generate its password using forgot password mechanism. The same is explained in the following steps.
Step 10: The user to go to the home page of DSpace and click on the “Log In” link appearing at the top right corner of the screen and pop up will open, as
illustrated below screen.
Step 11: Click on the “Have you forgotten your password?” password link reset page.
801
Step 12: Enter the Email ID used for Eperson creation in the DSpace and click on the “Save” button.
Step 13: A prompt confirming a password reset link dispatch to the registered email id will appear and the user will be redirected to the home page. This E
Mail will go from the communication mail ID registered in the DSpace instance.
802
Step 14: Click on the unique registration link received in the email to continue the password reset process. Suppose the user’s email client or server
security settings do not show hyperlinks. In that case, the user can copy the link and paste it into its browser.
Step 15: Enter a password of your choice and re-enter the same password. Followed by this, click on the “Submit Password” button to complete the
password generation process.
Update an Eperson
Eperson updates can be performed by users having System, Community, and Collection Administrator rights. These users can perform the following
activities:
803
Step 2: Enter your user id and password and click on the login button for logging in to DSpace.
Step 3: Users with administrative rights will see the admin menu on the left-hand side of the screen, as shown below illustration.
804
Step 4: Rollover your cursor over the administration menu and click on Access control. Click on the People link to go to the EPeople module.
805
Step 7: Click on the “Edit” button to continue with Eperson editing.
Step 8: You can make the following updates on the Edit Eperson page
1. Update Eperson details – You can update First Name, Last Name, and Email ID as the user’s identification details.
2. Enable/Disable User Login – Uncheck this option to disable the user’s login into the DSpace without deleting it. This option is helpful for scenarios
where users need to be disabled from logging in and performing specific actions temporarily.
3. Requires Certificate – Configure this option if a certificate is to be used for the login.
4. Reset Password – Password can be reset for Eperson using this option. A Password field should be present for utilizing this option.
5. Impersonate Eperson – The user with rights to Impersonate EPerson can impersonate the selected Eperson and perform all activities that the
Eperson is entitled to.
6. Delete Eperson – Using this option, Eperson can be deleted permanently from DSpace
7. Update User group(s) – This button allows the administrator to add selected Eperson to multiple user groups.
You can make options # 1 to 3 and click on Save for updating the Eperson record.
806
Step 9: Upon successful execution of the update, you will notice a success prompt on the screen.
807
Step 11: Please click on the “Edit” button next to the user group you want to select for this Eperson.
808
Step 12: Enter the eperson’s name or any other metadata value in the search field to find the target user.
809
Step 13: You will see Epeople appearing as a result of a search made. Click on the + button appearing next to Eperson you want to add to this group.
Step 14: Upon successfully adding Eperson to the group, a success prompt will appear on the screen, and Eperson will appear in the Current Members list
of the group.
810
Step 15: You will see this group on the profile page of Eperson as well.
811
Step 17: DSpace will show you a confirmation prompt to re-confirm your decision of deleting the selected Eperson. Click on “Delete” if you want to continue
with the deletion, or click on “Cancel” if you want to cancel the deletion.
Step 18: Successful deletion of the user will be confirmed by displaying a success promptly. You will be redirected to the Epeople page.
812
813
Create or manage a user group
User groups in DSpace are meant for creating a group of E-People. These groups can be assigned access rights to allow users to perform multiple
activities or manage content access in the repository.
This section provides steps to add or update a User group in the application.
Audience
Add a user group
Manage a user group
Update group details
Delete Group
Add/Manage EPeople
Add/Manage subgroups
Audience
Repository Administrator
Community Administrator
Step 1: Click on the "Log In" link at the top right corner of the DSpace's homepage, and the pop-up will open, as illustrated below the screen.
Step 2: Enter your user id and password and click on the "Log in" button to enter into DSpace.
814
Step 3: Users with administrative rights will see the admin menu on the left-hand side of the screen, as shown in the below illustration.
Step 4: Rollover your cursor over the administration menu, click on "Access Control," and click on the "Groups" link.
815
Step 5: Click on the "Add Group" button to create a Group.
Step 6: The "Group Name" is compulsory. To benefit a broader user base, a description of the Group is a good practice.
The "Save" button will get activated upon entering the group name.
816
Step 7: Upon successfully adding Group in DSpace, you will see a success prompt on the screen.
Step 8: The Group created will appear in the Group's list as highlighted below.
817
Manage a user group
The System, Community, and Collection administrator profiles can update user group(s). These users can perform the following activities:
818
Step 2: Enter your user id and password and click on the "Log in" button to enter into DSpace.
Step 3: Users with administrative rights will see the admin menu on the left-hand side of the screen, as shown in the below illustration.
819
Step 4: Rollover your cursor over the administration menu, click on "Access Control," and click on the "Groups" link.
Step 5: Users can find the Group required to be edited or deleted by scrolling down the Groups list or entering Group's metadata in the search field on the
group page. Please see the illustration below showing an example.
820
Step 6: Click on the
821
Step 7: You can make the following updates on the Edit group page
Step 8: Update the group name and description to update the group details and click on the "Save" button.
Step 9: A success prompt will appear, confirming a successful update as displayed below.
822
Delete Group
Step 10: Click on the "Delete Group" button to permanently delete the Group from DSpace.
Step 11: DSpace will show you a confirmation prompt to re-assess your decision of deleting the Group. Click on "Delete" if you want to continue, or click
"Cancel" to return.
Please note that the Group, once deleted, can not be recovered.
823
Step 12: Users will see a success prompt confirming the user group deletion, as demonstrated below.
Add/Manage EPeople
Step 13: Scroll down to the Epeople section to find the user(s). Use metadata values to search Epeople and click the "Search" button, or click the "Browse
All" button to list EPeople.
824
Step 14: Click on the add button '+' next to each user name to add that user as a group member.
Step 15: A prompt will appear confirming the addition of users to the Group. As illustrated below, the "Delete" button will appear next to the user names.
825
Success prompt confirming user added as members
Add/Manage subgroups
Step 16: Scroll down to the Group section to find the Group (s). Enter a group name and click the "Search" button, or click the "Browse All" button to list
groups.
826
Step 17: Click on the add button '+' next to each group name to add it as a sub-group.
Step 18: A prompt will appear to confirm a subgroup in the Group. As illustrated below, the "Delete" button will appear next to the group names.
827
Success prompt confirming groups added as subgroups
828