Summary and Closing

As the year changed into 2015, Open Source Archive project was completed. However, this doesn’t mean the end of OSA as a platform and software. This post is a high level overview of what we achieved, what we released and what is coming next.

OSA was a development project in the first place. Many of the results are are packed into the OSA software but we did a few publications as well. Like all software, OSA is never really complete. Once the initial roadmap is done, we can add new features and improve the existing. There are always some tweaking, additional user interfaces and such to be developed. We are proud to declare OSA as a pilot ready platform. Surely there could be a couple of bugs and missing features but that’s the nature of software. The final release of OSA project is aimed to be further developed into a production ready software or adopted as a platform for any digital archive or repository project.


  • Content agnostic digital archive and repository platform
  • Premade models for common objects (document, audio, moving image, picture etc.)
  • Ingest, management and distribution of digital contents (files and/or metadata)
  • Full-text and natural search
  • Flexible and granular access management
  • Linked data and ontology support
  • SaaS and multitenancy support
  • Completely customizable user interface, content models and preservation policies
  • And lots more


In addition, six bachelor thesis were made during the project

The software developed during the project is made available as open source on Github. There is also compact documentation and example configurations to get you started. We also provide a working demo for fast evaluation.

Github repository:
Demo site:

As the project is now completed, it is likely that this blog is not updated anymore. We could use it to communicate future developments but we haven’t decided the channels yet. The Github repository and the demo site will be kept updated with the latest releases, documentation and contact information.

Thank you for following these posts. If you wish to stay in touch or ask anything, just drop an email to or

Posted in Archive system, Digital archiving, Project Information, Publications, Software development | Tagged , , , , , , , , , , , | Leave a comment

Towards open and sustainable information

We are proud to announce an upcoming seminar Towards Open and Sustainable Information 19-20.11.2014 in Mikkeli. The full program and registration are available here.

It’s a free two day seminar on topics such as:

  • Finnish national data exchange layer and Estonian X-Road solution
  • Digital preservation and archiving
  • Digital information management and standards
  • Open source and sustainable development
  • My data and human centric open data

We have an excellent list of high profile speakers from international and national organizations such as Ministry of Finance, Estonian Ministry of Economic Affairs and Communications, Ministry of Education and Culture, University of Eastern Finland, University of Hull, Plymouth University, Open Knowledge Finland, National Archives of Finland, National Archives of Sweden and others.

The seminar will be bilingual: the first day will be in English and covers international topics. The second day will be in Finnish and focuses more on national level topics. During both seminar days there will be exhibition by partners, private companies, organizations and projects operating in the field of the seminar topics. In the evening of the first day we will host a dinner for our speakers and guests. The details will be announced later but there will be food and networking opportunities.

The seminar is hosted by Open Source Archive (OSA) and SOTU-sähkö projects. As said earlier, the seminar will be free of charge and we will provide you with coffee and soup lunch. The evening event will be paid by the participants themselves. The project publications will be available during the seminar. The publications provides detailed articles by the speakers and the project topics.

As a last announcement, I would like to note that we will do our best to update the blog more often. As often with projects, the schedule tightens towards the end. We have lots of development going on as well as publications and events.

If you didn’t already check out the seminar page, please do it now:

Posted in Data management, Digital archiving, Publications, Seminar | Tagged , , , , , , , , , , , , , , , , | Leave a comment

Workflow engine under way

We started integrating a workflow engine into OSA-application last week. The workflow engine is developed as bachelor’s thesis by Heikki Kurhinen. User interfaces for the batch ingest has been done already for the beta version, but the functionality has been missing.
We are going to develop several micro services for OSA-specific workflows. These workflows will generate metadata automatically, before we ingest documents into our archiving system. Automatically generated metadata during workflows execution is stored temporarily into Mongo database. The aim is to bring automation to the ingest process. Users can ingest several files at the same time and the content will be handled
at the same way during the ingest process. Regardless of ingest style (manually or batch ingest using the workflow engine) the last step of our ingest process is still XML schema validation, that is already available in the current version. We have determined schemas for all data types to guarantee the minimum requirements of the Capture-datastream. Schemas are stored in the content models in Fedora Commons Repository.

At the same time of developing new features into OSA-application, we get the users feedback of the current OSA-application version running in
Of course we are testing OSA-application ourselves, but we can not reach the real-life use cases by testing newly created functionalities mainly. Feedback from project partners is very essential for keeping development roadmap and users real requirements in balance.

Posted in Uncategorized | Tagged , | Leave a comment

Beta released

Open Source Archive beta was published yesterday. You can visit it at The default interface is used to search and inspect the public content (which is currently very limited). Once logged in, you can access the full features for information managers, researchers and archive administrators. If you would like to have a test account, just drop me an email or leave your details in a comment. In future, we will also open a public registration for test accounts. The final software will be released as open source and provided with a SaaS model if you prefer a turnkey solution.

Current emphasis on development has been on core features (access rights, archive management, ingest, description, searching and indexing) and the user interface of course. Next on the roadmap we have multiple pilot cases in which we develop more advanced features and solutions to specific problems. Here are a few highlights: batch and automated ingests, distributed workflows and discovering and visualizing the metadata and other contents for end users (both researchers and non-researchers). 

The project personnel and the control group were both satisfied. There has been lots of work to get this far. Still, the software is very much in development and we will keep on making it better, more pleasant and loaded with useful features. We would be grateful for all the feedback and critical comments. And we take the feedback and people very seriously. Weekly updates are released to fix any inconveniences and bugs there might be in the first beta. We don’t plan to release perfect software by designing it ourselves. That would be impossible. We learn and iterate, so each release will be better.


Posted in Archive system, Software development | Tagged , , , , | Leave a comment

Less than week to beta

It’s soon time to launch beta. It will go live next Tuesday. It’s a milestone, but as we do near continuous integration and can publish minor version multiple times a week it is not that drastic change. The difference from previous alpha is huge, though. The beta is most likely going to evolve a lot during its first weeks. I will write more about the launch next week. At that time, we will also start planning first pilot cases and begin implementing them as soon as possible.

There has been a major milestone even before the beta launch. We got our local DAITSS installation working. DAITSS is our choice for the dark archive, developed by Florida Center for Library Automation (FCLA). Finding the best components and compatible software for DAITSS was quite a tricky task but the system should be very stable once installed. We will next ingest data into it and try it in action. If all goes well, we can mirror and distribute the system to keep the data even more safe.

The latest developments include unit testing and improving the access rights system and the user interface. It has been really time consuming to build a solid interface even with the feedback from our test users. Lots of assumptions made by developers are not quite there. Fine tuning each detail takes time. The key is to identify and model the use cases and processes well. Luckily, there were some great thesis works about designing the user experience for digital archives made last year and one completing this spring. If I could change one thing, I would start the interface design even earlier.

I will keep this posting short. More time for beta development, less time to write about it.


Posted in Daitss, Software development | Tagged , , , , , , | Leave a comment

Data management and upcoming conferences

First, we would like to announce that we will be participating Archiving 2014 conference at Berlin in May. The OSA project has two papers: Flexible Data Model for Linked Objects in Digital Archives by Mikko Lampi (MAMK) and Olli Alm (ELKA – Central Archives of Finnish Business Records) and Micro-services Based Distributable Workflow for Digital Archives by Heikki Kurhinen (MAMK/Otavan Opisto) and Mikko Lampi (MAMK). The first paper is about the data model designed during the Capture project. The model is implemented in OSA software and is further developed until the end of the project. The main technologies behind the model are the Fedora object model and RDF. The workflow paper is about the software developed by Heikki as his bachelor thesis. It is designed to be very simple and able to be integrated with any software.

Here is the latest development news. We started implementing mass modification features for managing the object metadata before and after the ingest. Mass update is very useful for describing the batches of files before ingesting them. Adding common metadata about the owner or the origin can help the ingest and management processes. But we found that it is not that simple to modify descriptive metadata after the initial ingest. Therefore, in the upcoming beta version, the mass updates are available only for files in the workspace before ingesting. We will continue to develop the archive mass updating during the spring.

Much of the Fedora content models have been refactored to contain only the absolute minimum data required to understand and manage the objects. The information about forms and organization specific mappings and such were removed from the archive because they are only views or interpretations of the data and not the data definition itself. This will make the design more consistent and allow organizations and users to have much better customization features.

After the last posting, we found out that the current GSearch solution doesn’t support the latest Solr, which we required because of the Finnish language support. This has been resolved with a new release of GSearch. Because our Solr and Fedora are installed in separate servers, we cannot use GSearch’s reindex functionality. I did some initial testing with SSHFS and NFS for connecting the servers but this approach is not very sustainable and network or server errors can cause index desynchronization. We will develop a module to keep track on the sync status and perform reindexing as needed.


Posted in Data management, Fedora Commons, Software development | Tagged , , , , , , , , , , , | Leave a comment