Development news: digital workspace, access rights and other improvements

It’s time for a brief weekly recap. The general focus was put on user interface and access features. Again, the overall focus was as before but we moved into the next piece of the whole.

Before releasing a public access to our beta software, we need to be sure that data is secure. We have been working with an access rights filter and role based privileges since pre-alpha versions. It is based on external LDAP and embedded in our software code itself and indexed with the data to avoid performance bottlenecks. We are also looking into adding support for external security software. These will be discussed more this spring.


Digital workspace is a place where you can inspect, filter and enrich files before ingesting. It can be used to trigger workflows and manage automated ingesting. Handling lots of files requires a clear interface and good way to summarize the content. Our goal is to build a pre-archive workspace (if that is a term). It should be an easy task to filter, sort and group files to quickly decided what needs to be archived from an external hard disk or a thumb drive, for instance. It can be used to monitor an FTP upload directory or a network directory. Later, we will add the ability to ingest a batch metadata in Excel, CSV or XML file. Now the goal is to create an unified user interface and add very simple ingest workflow. During pilot tests we will expand the functionality.

Now that we have Solr schemas defined, we need to create an integration between Solr and Fedora Commons. Currently, we have GSearch and Apache ActiveMQ operating as middleware. GSearch listens to a message queue, where the Fedora sends messages of the changes to XML and other files (datastreams), and sends them to Solr’s REST interface via XSLT. This will be changed once Fedora 4 is in stable beta, and we can start working with it. The Solr schemas and messaging will stay, but we are looking to replace GSearch with something simpler and more flexible.

Still, there are lots of work to be done until beta.


Posted in Fedora Commons, Software development | Tagged , , , , , , , , , | Leave a comment

Development news: user interface, usability, search and lots of iterations

First I would like to introduce a change to the posting schedule. I’ll try to update the blog at the start of the week instead of my traditional Friday post. It would serve the project better to open a week with a fresh post and introduce some ideas rather than try to catch the attention of people oriented for the weekend.

To recap the last week, we continued the same direction as previously. Lots of effort was put on search and indexing features. We now have an understanding on how Solr handles field types for faceting, sorting and what kind of a schema is efficient. We ended up sharding the metadata and rich text contents. For most of the time preserved documents don’t change. Metadata in the other hand, can be added and enriched over time. The user interface received improvements as well. Lots of content is loaded and saved with ajax to minimize loading time and keep the experience intact. Users can keep on working while actions are performed on the server and only the status is updated to the screen. These are quite basic features in web applications but lack of them can severely harm the user experience and make the software unpleasant to use. Another UI improvement was the consolidation of multiple pages or features into intuitive entities.

As the software is firstly built for Finnish use, we integrated the standard industrial classification (2008) by Statistics Finland and another classification by Finnish Business Archive Association. They are imported as read-only contextual objects and can be used to describe the content of any organization using the system.

We continued the iterations for  improving the code quality and features for developments done during alpha stages. Lots of work still needs to be done until we reach the sustainable level. But by proceeding this way, we have months of feedback and testing of the core features. I think that the ready features would have taken considerable more time to build and would in the end result in worse quality. At least there would have been a major risk of building something not needed and prioritizing the wrong features.

Open Repositories 2014 is coming to Helsinki this year. We will be there on the Fedora tech track. Most likely I will be sharing experiences and introducing the project and the latest developments.


Posted in Fedora Commons, Software development | Tagged , , , , , , , , | Leave a comment

Solr, data and minor updates

The week has been busy but as promised here are some insights into the latest development. The main focus has been data: Solr indexing and configuration, data stores, user interfaces for searching the data and other supporting features.

Understanding the language which documents and metadata are written is very important.  So, we had to teach Solr some Finnish. It can handle with ease widely used languages like English, but Finnish requires a dictionary and complex rules.  We already had some experience on developing search features with Solr and how real production data behaves. When we started experimenting with Solr, there were no open and free tools for the Finnish language. Of course Voikko existed but not with a license we could use in commercial services and our terms. With the new licensing and work done with the Solr Voikko plugin by the National Library’s KDK project we upgraded the language understanding of Solr to the next level. Voikko is the same language tool used with Open Office’s Finnish features. I’ve been learning how to build more optimized Solr schemas and configurations. First we went with highly dynamic declarations to avoid rebuilding indexes making the system too fixed, but now it seems that it has a negative effect on performance and certain key features like sorting and faceting by the fields. Our experiences and production data provide a good starting point but still tuning the settings and finding best analyzers, field types and such is a task not to be rushed or taken too lightly. Again KDK provides a nice reference with their public schemas (available on Github). We will publish our own schemas and configurations with the software and contribute to the Voikko Solr plugin if we need to modify it.

At the moment, we are still using Fedora 3.x and GSearch to feed Solr. GSearch takes messages sent by Fedora and transforms the to Solr documents for its REST interface. During the spring or early summer, we hope to migrate to Fedora 4 which eliminates the need for GSearch and simplifies the setup. For other data stores RDF databases and engines look very interesting. Fedora 3 ships with Mulgara and we will use it until the migration. Apache Jena looks like an interesting alternative but we are still in discovery mode with that.

The search user interface is becoming more and more polished. We are working with faceting at the moment. Lots of the features as well as overall look and feel is from benchmarking and from active partners and their users. Bug fixes are done on daily basis and other small improvements. We embedded the official standard industrial classification by Statistics Finland to speed up the ingest and describe process.

More updates coming next week.


Posted in Data management, Fedora Commons, Software development | Tagged , , , , , , , , , | Leave a comment

From alpha to beta

With the year 2014 the winter has finally came to Finland. In addition to that, we have development running at full speed. Our target is to hit the beta release at the beginning of March. The roadmap is a combination of user feedback from earlier agile sprints and specifications made during Capture project. The main focus with the beta sprint is on the user interface, search and indexing, ingest and management of the data. All the core features are in place already since alpha release, but it required some knowledge and good understanding on how things are in the early stages of software development.

I will post some previews of the user interface later. The interface will be in English and Finnish. It can be localized and customized easily to support your desired language, regional preferences and organization policies. Effort is also put into making the interface user friendly. Since the early pre-alpha versions, we have iterated the user interface development with the designated users. Stay tuned for the interface mockups or even screenshots.

Searching and indexing will be implemented mainly with Solr. It is the de facto open source engine for this kind of content. Also Finna (nice user interface for accessing Finnish archives, libraries and museums) uses Solr. Traditionally Finnish has been somewhat tricky language but now we can have a good support and we can benefit from open source work done already. We have a working integration in between our data repository (Fedora Commons) and Solr but we need to work on indexing metadata and rich text contents and building end-user features. We will build a new kind of search and browse interface based on faceting and visualizing the data instead of just an empty search box waiting for you to know what you need to input. Of course the Google-like search box will be there also if you like it better.

The third large sprint is to ready the tools for ingest and management of the archive data. We will build a workspace that will contain pre-ingest and discovery tools, manual ingest, workflows and batch ingest as well as management of the existing data. Again, we will make it highly configurable and suitable for ingesting data of any kind; Not just documents or strictly formatted metadata. For beta and pilot phases, we will introduce only reference features but they can be extended and will be well documented.

Lastly, the simple workflow engine has been now completed and will be published as a standalone project on Github later this year. The developer, Heikki Kurhinen, wrote it for the project as his thesis work. The engine will be featured in later article as soon as we get it integrated with our main software.

I will write more on the topics introduced as our team progresses.


Posted in Data management, Software development | Tagged , , , , , , , , , , | Leave a comment

Alpha release

Last week we reached a major milestone and released the first alpha version of our software. We defined the alpha as functional software which wraps up developments made during sprints last summer. The text focus is iterative improvement, improving code quality and public beta release by the end of February.

Last time I wrote about agile development and experiences from development sprints in summer. While the results were good and mostly positive, there is always room for improvement. I think better code management and profound unit testing need to be involved. Now it took a couple of weeks integrating, testing and bug fixing to get there. Sure there were also iterations based on early feedback. But this is something not to overlook.

The alpha release if primarily targeted for project partners who participate in early development. We don’t want to scare the public with our hideous development interfaces. The use of the current user interface requires a good understanding about the underlying processes and functionality. Beta release will introduce new user interface and better quality code and documentation. If you’re interested, this is the time to get involved. We will also publish a live demo.

Alpha features include:

  • Metadata model (known as Capture model)
  • Fedora Commons content models (document models, place, event, action, agent)
  • Metadata management (forms, validation, indexing, linked data)
  • Web based ingest forms
  • Archive file management (archive copies, attachments, previews for images and PDFs)
  • Search and browse user interfaces
  • Full-text indexing
  • User and access management
  • Ingest and data filtering workspace prototype
  • Feedback module with screen capture
  • Developer friendly user interface

And there is much more going on under the hood. Comment or send email if you’re interested in those.


Posted in Software development | Tagged , , , | Leave a comment

Modeling batch ingest process and developing workflow engine

I worked as a programmer for the OSA-project during the summer, however because of our fast developing pace, I never found time to write to this blog. Now, when I’m back at school for the last year of my studies I decided to do my bachelor’s thesis for the OSA-project.

The topic is going to be developing (hopefully) simple workflow engine which is going to handle the execution of micro-services. The communication with the workflow engine is going to be developed using REST that is because then user can develop their own UI or easily integrate this workflow engine into their projects. Also there needs to be possibility for timed runs and different triggers, for these I’m going to use Quartz scheduling library. Also possibility for using the workflow engine as a distributed application is something that needs to be researched.

I’m also going to create a batch ingest workflow and required micro-services as an example on how to create workflows with my engine. That project started by modeling the batch ingest workflow. Because I don’t have much archiving experience I started out by looking at other archiving systems like Archivematica and Islandora. After inspecting these systems and doing my own research I could start doing some modeling.

To do the actual model I used software called yEd which was free and easy to use. I didn’t want to use too complex modeling methods because I think it is important to be able to create models which can be understood also by people without deep technical knowledge. For example someone who has worked with archiving for long time but doesn’t understand that much about programming or process modeling can still see what is going on in the workflow and may be able to provide important knowledge about archiving.

Here is the current model:


Of course later this process will be fully configurable but these are going to be settings for now.

Posted in Software development | Tagged , , , | Leave a comment

Agile and user focused development

Agile development is not only hype. Our experiences on the approach this summer were very positive. We went from technology proof-of-concept to the alpha version very agile and fast, only in four months. Now, we are almost there. Just a small round of bug fixing and polishing. We are already online with private proto and hope to make the beta version a public release by the end of the year.

For many businesses and organizations, user focused development and user focused products are an important strategy. Likewise Mikkeli University of Applied Sciences has made it a key part in its strategy in digital archiving and services. The other two are the productization and development of digital information management and archiving.

I had a presentation about agile and user focused development a few weeks back at the Liikearkistopäivät 2013 (it’s a three-day event organized by Finnish Business Archive Association, FBAA). Below are presented the key points and some experiences from our sprint.

For those who understand Finnish language, here is a small comic relief for starters.

There are two drivers which will usually lead to weak results and defocus the user: management (bureaucracy) driven development and IT driven development. Though both are excellent tools. It is also known (for 40 years) that the waterfall model is not suitable for software projects. The development should always be driven by the need and the people who would use it. By making the development process open and transparent, you can engage people and eliminate the resistance to changes. You could even make the solution open source. But opening software or decision making is not a shortcut to success. There needs to be a clear understanding of the goals and means to achieve them.

It should not be surprising how much attitude problems can escalate. The common problems include:
“What an end user could possibly know about designing and implementing an IT system ” “We cannot show the unfinished software to the customers”
“User research is too expensive and humbug”
“Our business is a special case. We cannot use any standard solution”
“The software should adapt our legacy processes”
The list could go on and on. Both sides, the customer and the provider, have attitude issues which need to fixed.

Responsibilities should be made clear to all involved sides. It is much easier to know who is responsible for what if the whole project is managed in small pieces. Dividing colossal projects would be a good start for at least the public sector and the government. The idea is not that hard and even some Finnish ministries have added user friendliness to their strategies and recommendations.

There are tens or hundreds of various methods for user research in any point of the process. The complete life cycle of products, projects and processes can be managed. Website like All About UX contain very useful information.

Agile development is not a single methodology but more like a term that covers a wide range of totally different approaches. They should all share the idea behind the Agile Manifesto.

In our summer sprint, we made use of feedback driven development and prototyping features in a rapid pace. The test users were the actual end users of the future system. We could make changes to the implementation on a weekly basis and could add or remove features without cumbersome change processes. It is the only way to build software for businesses without being an actual professional in that said business; And we software developers are rarely.

Mikko Lampi

Posted in Software development | Tagged , , , , | Leave a comment