Towards open and sustainable information

We are proud to announce an upcoming seminar Towards Open and Sustainable Information 19-20.11.2014 in Mikkeli. The full program and registration are available here.

It’s a free two day seminar on topics such as:

  • Finnish national data exchange layer and Estonian X-Road solution
  • Digital preservation and archiving
  • Digital information management and standards
  • Open source and sustainable development
  • My data and human centric open data

We have an excellent list of high profile speakers from international and national organizations such as Ministry of Finance, Estonian Ministry of Economic Affairs and Communications, Ministry of Education and Culture, University of Eastern Finland, University of Hull, Plymouth University, Open Knowledge Finland, National Archives of Finland, National Archives of Sweden and others.

The seminar will be bilingual: the first day will be in English and covers international topics. The second day will be in Finnish and focuses more on national level topics. During both seminar days there will be exhibition by partners, private companies, organizations and projects operating in the field of the seminar topics. In the evening of the first day we will host a dinner for our speakers and guests. The details will be announced later but there will be food and networking opportunities.

The seminar is hosted by Open Source Archive (OSA) and SOTU-sähkö projects. As said earlier, the seminar will be free of charge and we will provide you with coffee and soup lunch. The evening event will be paid by the participants themselves. The project publications will be available during the seminar. The publications provides detailed articles by the speakers and the project topics.

As a last announcement, I would like to note that we will do our best to update the blog more often. As often with projects, the schedule tightens towards the end. We have lots of development going on as well as publications and events.

If you didn’t already check out the seminar page, please do it now: http://www.mamk.fi/sotu-osa

Posted in Data management, Digital archiving, Publications, Seminar | Tagged , , , , , , , , , , , , , , , , | Leave a comment

Workflow engine under way

We started integrating a workflow engine into OSA-application last week. The workflow engine is developed as bachelor’s thesis by Heikki Kurhinen. User interfaces for the batch ingest has been done already for the beta version, but the functionality has been missing.
We are going to develop several micro services for OSA-specific workflows. These workflows will generate metadata automatically, before we ingest documents into our archiving system. Automatically generated metadata during workflows execution is stored temporarily into Mongo database. The aim is to bring automation to the ingest process. Users can ingest several files at the same time and the content will be handled
at the same way during the ingest process. Regardless of ingest style (manually or batch ingest using the workflow engine) the last step of our ingest process is still XML schema validation, that is already available in the current version. We have determined schemas for all data types to guarantee the minimum requirements of the Capture-datastream. Schemas are stored in the content models in Fedora Commons Repository.

At the same time of developing new features into OSA-application, we get the users feedback of the current OSA-application version running in https://osa.mamk.fi.
Of course we are testing OSA-application ourselves, but we can not reach the real-life use cases by testing newly created functionalities mainly. Feedback from project partners is very essential for keeping development roadmap and users real requirements in balance.

Posted in Uncategorized | Tagged , | Leave a comment

Beta released

Open Source Archive beta was published yesterday. You can visit it at https://osa.mamk.fi/. The default interface is used to search and inspect the public content (which is currently very limited). Once logged in, you can access the full features for information managers, researchers and archive administrators. If you would like to have a test account, just drop me an email or leave your details in a comment. In future, we will also open a public registration for test accounts. The final software will be released as open source and provided with a SaaS model if you prefer a turnkey solution.

Current emphasis on development has been on core features (access rights, archive management, ingest, description, searching and indexing) and the user interface of course. Next on the roadmap we have multiple pilot cases in which we develop more advanced features and solutions to specific problems. Here are a few highlights: batch and automated ingests, distributed workflows and discovering and visualizing the metadata and other contents for end users (both researchers and non-researchers). 

The project personnel and the control group were both satisfied. There has been lots of work to get this far. Still, the software is very much in development and we will keep on making it better, more pleasant and loaded with useful features. We would be grateful for all the feedback and critical comments. And we take the feedback and people very seriously. Weekly updates are released to fix any inconveniences and bugs there might be in the first beta. We don’t plan to release perfect software by designing it ourselves. That would be impossible. We learn and iterate, so each release will be better.

Mikko

Posted in Archive system, Software development | Tagged , , , , | Leave a comment

Less than week to beta

It’s soon time to launch beta. It will go live next Tuesday. It’s a milestone, but as we do near continuous integration and can publish minor version multiple times a week it is not that drastic change. The difference from previous alpha is huge, though. The beta is most likely going to evolve a lot during its first weeks. I will write more about the launch next week. At that time, we will also start planning first pilot cases and begin implementing them as soon as possible.

There has been a major milestone even before the beta launch. We got our local DAITSS installation working. DAITSS is our choice for the dark archive, developed by Florida Center for Library Automation (FCLA). Finding the best components and compatible software for DAITSS was quite a tricky task but the system should be very stable once installed. We will next ingest data into it and try it in action. If all goes well, we can mirror and distribute the system to keep the data even more safe.

The latest developments include unit testing and improving the access rights system and the user interface. It has been really time consuming to build a solid interface even with the feedback from our test users. Lots of assumptions made by developers are not quite there. Fine tuning each detail takes time. The key is to identify and model the use cases and processes well. Luckily, there were some great thesis works about designing the user experience for digital archives made last year and one completing this spring. If I could change one thing, I would start the interface design even earlier.

I will keep this posting short. More time for beta development, less time to write about it.

Mikko

Posted in Daitss, Software development | Tagged , , , , , , | Leave a comment

Data management and upcoming conferences

First, we would like to announce that we will be participating Archiving 2014 conference at Berlin in May. The OSA project has two papers: Flexible Data Model for Linked Objects in Digital Archives by Mikko Lampi (MAMK) and Olli Alm (ELKA – Central Archives of Finnish Business Records) and Micro-services Based Distributable Workflow for Digital Archives by Heikki Kurhinen (MAMK/Otavan Opisto) and Mikko Lampi (MAMK). The first paper is about the data model designed during the Capture project. The model is implemented in OSA software and is further developed until the end of the project. The main technologies behind the model are the Fedora object model and RDF. The workflow paper is about the software developed by Heikki as his bachelor thesis. It is designed to be very simple and able to be integrated with any software.

Here is the latest development news. We started implementing mass modification features for managing the object metadata before and after the ingest. Mass update is very useful for describing the batches of files before ingesting them. Adding common metadata about the owner or the origin can help the ingest and management processes. But we found that it is not that simple to modify descriptive metadata after the initial ingest. Therefore, in the upcoming beta version, the mass updates are available only for files in the workspace before ingesting. We will continue to develop the archive mass updating during the spring.

Much of the Fedora content models have been refactored to contain only the absolute minimum data required to understand and manage the objects. The information about forms and organization specific mappings and such were removed from the archive because they are only views or interpretations of the data and not the data definition itself. This will make the design more consistent and allow organizations and users to have much better customization features.

After the last posting, we found out that the current GSearch solution doesn’t support the latest Solr, which we required because of the Finnish language support. This has been resolved with a new release of GSearch. Because our Solr and Fedora are installed in separate servers, we cannot use GSearch’s reindex functionality. I did some initial testing with SSHFS and NFS for connecting the servers but this approach is not very sustainable and network or server errors can cause index desynchronization. We will develop a module to keep track on the sync status and perform reindexing as needed.

Mikko

Posted in Data management, Fedora Commons, Software development | Tagged , , , , , , , , , , , | Leave a comment

Development news: digital workspace, access rights and other improvements

It’s time for a brief weekly recap. The general focus was put on user interface and access features. Again, the overall focus was as before but we moved into the next piece of the whole.

Before releasing a public access to our beta software, we need to be sure that data is secure. We have been working with an access rights filter and role based privileges since pre-alpha versions. It is based on external LDAP and embedded in our software code itself and indexed with the data to avoid performance bottlenecks. We are also looking into adding support for external security software. These will be discussed more this spring.

 

Digital workspace is a place where you can inspect, filter and enrich files before ingesting. It can be used to trigger workflows and manage automated ingesting. Handling lots of files requires a clear interface and good way to summarize the content. Our goal is to build a pre-archive workspace (if that is a term). It should be an easy task to filter, sort and group files to quickly decided what needs to be archived from an external hard disk or a thumb drive, for instance. It can be used to monitor an FTP upload directory or a network directory. Later, we will add the ability to ingest a batch metadata in Excel, CSV or XML file. Now the goal is to create an unified user interface and add very simple ingest workflow. During pilot tests we will expand the functionality.

Now that we have Solr schemas defined, we need to create an integration between Solr and Fedora Commons. Currently, we have GSearch and Apache ActiveMQ operating as middleware. GSearch listens to a message queue, where the Fedora sends messages of the changes to XML and other files (datastreams), and sends them to Solr’s REST interface via XSLT. This will be changed once Fedora 4 is in stable beta, and we can start working with it. The Solr schemas and messaging will stay, but we are looking to replace GSearch with something simpler and more flexible.

Still, there are lots of work to be done until beta.

Mikko

Posted in Fedora Commons, Software development | Tagged , , , , , , , , , | Leave a comment