Daisy documentation
Book Index

Daisy documentation

Table of Contents

1 Documentation Home

These pages contain the documentation of the Daisy 2.1 release.

See also:

The documentation is also available published as a Daisy-book.

For an end-user introduction to Daisy, have a look at the video tutorials.

2 Installation

2.1 Downloading Daisy

Packaged versions of Daisy can be found in the distribution area (Sourceforge). This includes everything required to run Daisy, except for:

If you don't have these already, the installation of these will be covered further on.

Consider subscribing to the Daisy mailing list to ask questions and talk with fellow Daisy users and developers.

There is also information available about the source code.

2.2 Installation Overview

Daisy is a multi-tier application, consisting of a repository server and a publication layer. Next to those, a database server (MySQL) is required. All together, this means three processes, which can run on the same server or on different servers.

The Daisy binary distribution packs most of the needed software together, the only additional things you'll need is a Java Virtual Machine for your platform, and MySQL. All libraries and applications shipped with Daisy are the original, unmodified distributions that will be configured as part of the installation. We've only grouped them in one download for your convenience.

If you follow the instructions in this document, you can have Daisy up and running in less than an hour.

The diagram below gives an overview of the the setup. All shown port numbers are configurable of course.

2.2.1 Platform Requirements

We have tested the Daisy installation on Windows 2000/XP, GNU/Linux and MacOSX. Other unixes like Solaris should also work, though we don't test that ourselves.

2.2.2 Memory Requirements

By default, the Daisy Wiki and Daisy Repository Server are started with a maximum heap size of 128 MB each. To this you need to add some overhead of the JVMs themselves, and then some memory for MySQL, the OS and its (filesystem) caches. This doesn't mean all this memory will be used, that will depend on usage intensity.

2.2.3 Required knowledge

These installation instructions assume you're comfortable with installing software, editing configuration (XML) files, running applications from the command line, setting environment variables, and that sort of stuff.

2.2.4 Can I use Oracle, PostgreSQL, MS-SQL, ... instead of MySQL? Websphere, Weblogic, Tomcat, ... instead of Jetty?

Daisy contains the necessary abstractions to support different database engines, though we currently only support MySQL. Users are welcome to contribute and maintain different databases (ask on the mailing list how to get started).

The Daisy Wiki webapp should be able to run in any servlet container (at least one that can run unpacked webapps, and as far as there aren't any Cocoon-specific issues), but we ship Jetty by default. For example, using Tomcat instead of Jetty is very simple and is described on this page.

2.3 Installing a Java Virtual Machine

Daisy requires the Java JDK or JRE 1.5 or 1.6 (the versions are also know as 5 or 6). You can download it from here on the Sun site (take by preference the JDK, not the JRE). Install it now if you don't have it already.

After installation, make sure the JAVA_HOME environment variable is defined and points to the correct location (i.e., the directory where Java is installed). To verify this, open a command prompt or shell and enter:

For Windows:
%JAVA_HOME%/bin/java -version

For Linux:
$JAVA_HOME/bin/java -version

This should print out something like:

java version "1.5.0"

or

java version "1.6.0"

2.3.1 Installing JAI (Java Advanced Imaging) -- optional

If you want images (especially PNG) to appear in PDFs, it is highly advisable to install JAI, which you can download from the JAI project on java.net. Take the JDK (or JRE) package, this will make JAI support globally available.

2.4 Installing MySQL

Daisy requires one of the following MySQL versions:

MySQL can be downloaded from mysql.com. Install it now, and start it (often done automatically by the install).

Windows users can take the "Windows Essentials" package. During installation and the configuration wizard, you can leave most things to their defaults. In particular, be sure to leave the "Database Usage" to "Multifunctional Database", and leave the TCP/IP Networking enabled (on port 3306). When it asks for the default character set, select "Best Support For Multilingualism" (this will use UTF-8). When it asks for Windows options, check the option "Include Bin Directory In Windows Path".

Linux users: install the "MySQL server" and "MySQL client" packages. Installing the MySQL server RPM will automatically initialize and start the MySQL server.

2.4.1 Creating MySQL databases and users

MySQL is used by both the Daisy Repository Server and JMS (ActiveMQ). Therefore, we are now going to create two databases and two users.

Open a command prompt, and start the MySQL client as root user:

mysql -uroot -pYourRootPassword

On some systems, the root user has no password, in which case you can drop the -p parameter.

Now create the necessary databases, users and access rights by entering (or copy-paste) the commands below in the mysql client. What follows behind the IDENTIFIED BY is the password for the user, which you can change if you wish. The daisy@localhost entries are necessary because otherwise the default access rights for anonymous users @localhost will take precedence. If you'll run MySQL on the same machine as the Daisy Repository Server, you only need the @localhost entries.

CREATE DATABASE daisyrepository CHARACTER SET 'utf8';
GRANT ALL ON daisyrepository.* TO daisy@'%' IDENTIFIED BY 'daisy';
GRANT ALL ON daisyrepository.* TO daisy@localhost IDENTIFIED BY 'daisy';
CREATE DATABASE activemq CHARACTER SET 'utf8';
GRANT ALL ON activemq.* TO activemq@'%' IDENTIFIED BY 'activemq';
GRANT ALL ON activemq.* TO activemq@localhost IDENTIFIED BY 'activemq';

2.5 Extract the Daisy download

Extract the Daisy download. On Linux/Unix you can extract the .tar.gz file as follows:

tar xvzf daisy-<version>.tar.gz

On non-Linux unixes (Solaris notably), use the GNU tar version if you experience problems extracting.

On Windows, use the .zip download, which you can extract using a tool like WinZip.

After extraction, you will get a directory called daisy-<version>. This directory is what we will call from now on the DAISY_HOME directory. You may set a global environment variable pointing to that location, or you can do it each time in the command prompt when needed.

2.6 Daisy Repository Server

2.6.1 Initialising and configuring the Daisy Repository

Open a command prompt or shell and set an environment variable DAISY_HOME, pointing to the directory where Daisy is installed.

Windows:
set DAISY_HOME=c:\daisy-2.1

Linux:
export DAISY_HOME=/home/daisy_user/daisy-2.1

Then go to the directory <DAISY_HOME>/install, and execute:

daisy-repository-init

Follow the instructions on screen. The installation will (1) initialize the database tables for the repository server and (2) create a Daisy data directory containing customized configuration files.

2.6.2 Starting the Daisy Repository Server

Still in the same command prompt (or in a new one, but make sure DAISY_HOME is set), go to the directory <DAISY_HOME>/repository-server/bin, and execute:

daisy-repository-server <location-of-daisy-data-dir>

In which you replace <location-of-daisy-data-dir> with the location of the daisy data directory created in the previous step.

Starting the repository server usually only takes a few seconds, however the first time it will take a bit longer because the workflow database tables are created during startup. When the server finished starting it will print a line like this:

Daisy repository server started [timestamp]

Wait for this line to appear (the prompt will not return).

2.7 Daisy Wiki

2.7.1 Initializing the Daisy Wiki

Before you can run the Daisy Wiki, the repository needs to be initialised with some document types, a "guest" user, a default ACL configuration, etc.

Open a command prompt or shell, make sure DAISY_HOME is set, go to the directory <DAISY_HOME>/install, and execute:

daisy-wiki-init

The program will start by asking a login and password, enter here the user created during the execution of daisy-repository-init (the default was testuser/testuser). It will also ask for the URL where the repository is listening, you can simply press enter here.

If everything goes according to plan, the program will now print out some informational messages and end with "Finished.".

2.7.2 Creating a "wikidata" directory

Similar to the data directory of the Daisy repository server, the Daisy Wiki also has its own data directory (which we call the "wikidata directory").

To set up this directory, open a command prompt or shell, make sure DAISY_HOME is set, go to the directory <DAISY_HOME>/install, and execute:

daisy-wikidata-init

and follow the instructions on-screen.

Since the Daisy Wiki and the Daisy repository server are two separate applications (which might be deployed on different servers), each has its own data directory.

2.7.3 Creating a Daisy Wiki Site

The Daisy Wiki has the concept of multiple sites, these are multiple views on top of the same repository. You need at least one site to do something useful with the Daisy Wiki, so we are now going to create one.

Open a command prompt or shell, make sure DAISY_HOME is set, go to the directory <DAISY_HOME>/install, and execute:

daisy-wiki-add-site <location of wikidata directory>

The application starts by asking the same parameters as for daisy-wiki-init.

Then it will ask a name for the site. This should be a name without spaces. If you're inspirationless, enter something like "test" or "main".

Then it will ask for the sites directory location, for which the presented default should be OK, so just press enter.

2.7.4 Starting the Daisy Wiki

Open a command prompt or shell and make sure DAISY_HOME is set.

Go to the directory <DAISY_HOME>/daisywiki/bin, and execute:

daisy-wiki <location of wikidata directory>

Background info: this will start Jetty (a servlet container) with the webapp found in <DAISY_HOME>/daisywiki/webapp.

2.8 Finished!

Now you can point your web browser to:

http://localhost:8888/

To be able to create or edit documents, you will have to change the login, you can use the user you created for yourself while running daisy-repository-init (the default was testuser/testuser).

To start the Daisy repository server and Daisy Wiki after the initial installation, see the summary here, or even better, set up service (init) scripts to easily/automatically start and stop Daisy.

2.9 2.0(.x) to 2.1 changes

2.10 2.0(.x) to 2.1 compatibility

2.10.1 Skin compatibility

2.10.1.1 XSL-FO (stylesheets for PDF)

Daisy 2.1 ships with a major new release of the XSL-FO processor, FOP 0.93. If you have custom XSL-FO stylesheets, it could be there are smallish compatibility issues.

2.10.2 Repository extensions, authentication schemes, etc

2.10.2.1 New Runtime

Since we moved from Avalon Merlin to the new Daisy Runtime, you will need to adjust your repository extensions, authentication schemes, etc. to be compatible with the new infrastructure.

Some pointers to more information:

If you have trouble adjusting your extensions or understanding the new system, you can ask questions on the Daisy mailing list.

2.10.2.2 Package move

Most of the SPI classes have been moved to different packages. For example:

org.outerj.daisy.authentication => org.outerj.daisy.authentication.spi

It should be easy to adjust your classes, which you'll need to do anyhow for the new Daisy Runtime.

2.10.2.3 AbstractAuthenticationFactory

This class is deprecated (and non-functional). If you used this, see the updated NTLM and LDAP authentication schemes for how to update your code.

2.10.3 Publisher wraps exception

The Publisher now wraps any exception occurring in the publisher with a GlobalPublisherException, containing information on the execution stack of the publisher. This might have effects on how you handle exceptions coming from the publisher. For example you might do a catch for GlobalPublisherException and then do getCause() on it to get the actual exception.

The error.xsl has also been changed to hide the GlobalPublisherException.

2.10.4 Book publisher

2.10.4.1 If you're using custom book publication types

The shiftHeaders task has been deprecated, the heading shifting is now performed as part of the assembleBook task. This change had to be made in order to implement the new heading shifting for document includes.

Normally, you don't need to adjust anything: the shiftHeaders task still exists but now does nothing at all. To avoid future confusion, it is recommended you remove the shiftHeaders task from any custom book publication types you might have.

2.10.5 Changes to non-public things

The following are changes to Daisy internals that might be relevant for some users.

2.10.5.1 Constants.DAISY_LINK_PATTERN

This is not really a part of the public API, but if you would happen to use the regex pattern defined in Constants.DAISY_LINK_PATTERN, you might have to adjust your code because the structure of this pattern has changed a bit: meaningless groups have been changed into non-capturing groups. See the javadoc of that constant for the exact matching groups.

2.10.5.2 Change to htmlcleaner.xml

The pre element now allows a daisy-shift-headings attribute for the new heading shifting for document includes feature.

2.10.6 Automated installation

When making use of the possibility to specify a property file to the repository-server-init script, two new properties are now required: dbName and jmsDbName, containing the names of the databases (= the same as those which are  the JDBC URL).

2.11 2.0(.x) to 2.1. upgrade

These are the upgrade instructions for when you have currently Daisy 2.0 or 2.0.1 installed.

If you have 2.1-RC installed, see here.

2.11.1 Upgrading

2.11.1.1 Daisy installation review

In case you're not very familiar with Daisy, it is helpful to identify the main parts involved. The following picture illustrates these.

There is the application directory, which is simply the extracted Daisy download, and doesn't contain any data (to be safe don't remove it yet though).

Next to this, there are 3 locations where data (and configuration) is stored: the relational database (MySQL), the repository data directory, and the wiki data directory. The Daisy repository and the Daisy Wiki are two independent applications, therefore each has its own data directory.

The text between the angle brackets (< and >) is the way we will refer to these directories further on in this document. Note that <DAISY_HOME> is the new extracted download (see later on), not the old one.

2.11.1.2 Stop your existing Daisy

Stop your existing Daisy, both the repository server and the Daisy Wiki.

2.11.1.3 Download and extract Daisy 2.1

If not done already, download Daisy 2.1 from the distribution area (Sourceforge). For Windows, download the zip or autoextract.exe (not the installer!). For Unix-based systems, the .tar.gz is recommended. The difference is that the .zip contains text files with DOS line endings, while the .tar.gz contains text files with unix line endings. When using non-Linux unixes such as Solaris, be sure to use GNU tar to extract the archive.

Extract the download at a location of your choice. Extract it next to your existing Daisy installation, do not copy it over your existing installation.

2.11.1.4 Update environment variables

Make sure the DAISY_HOME environment variable points to the just-extracted Daisy 2.1 directory.

Note that when you start/stop Daisy using the wrapper scripts, you don't need to set DAISY_HOME, though you do need to update or re-generate the service wrapper configuration (see next section).

How this is done depends a bit on your system and personal preferences:

2.11.1.5 Creating log configuration

Daisy now uses log4j for logging, which needs a new configuration file in the repository data directory.

Therefore copy the file

<DAISY_HOME>/repository-server/conf/repository-log4j.properties

to

<REPO DATA DIR>/conf/

2.11.1.6 Updating the repository SQL database

Execute the database upgrade script:

cd <DAISY_HOME>/misc
mysql -Ddaisyrepository -udaisy -ppassword < daisy-2_0-to-2_1.sql

On many MySQL installations you can use "root" as user (thus specify -uroot instead of -udaisy) without password, thus without the -p option.

2.11.1.7 Adjusting the daisy.xconf file

Open the following file in a text editor:

<wiki data dir>/daisy.xconf

At the end of this file, before the closing </cocoon> tag, add these lines:

<component
    class="org.outerj.daisy.frontend.GuestRepositoryProviderImpl"
    role="org.outerj.daisy.frontend.GuestRepositoryProvider"
    logger="daisy">
    <guestUser login="guest" password="guest"/>
</component>

2.11.1.8 ActiveMQ configuration

The repository-server-init script of Daisy 2.0(.1) made an error in the ActiveMQ configuration. If you upgraded your 2.0 from earlier releases, the configuration should be OK, but there's no harm in checking it anyhow.

Open the following file in a text editor:

<daisydata dir>/conf/activemq-conf.xml

If you find the following line in that file, remove it:

<property name="poolPreparedStatements" value="true"/>

2.11.1.9 Jetty configuration (only when using a custom jetty-daisywiki.xml)

If you have a custom jetty-daisywiki.xml in your wikidata directory, it will need updating because Daisy 2.1 contains a major new Jetty version (6.1.3).

The easiest is probably to start from the new default jetty-daisywiki.xml found at

<DAISY_HOME>/daisywiki/conf/jetty-daisywiki.xml

and change what you want to change (usually just the HTTP port number).

The new default jetty-daisywiki.xml enables request logging by default. You might want to disable this if you have a webserver in front which also does request logging.

2.11.1.10 Wrapper scripts

This section is only applicable if you are using the wrapper scripts.

Various updates have been done to the wrapper scripts.

An important difference is that the wrapper scripts now require DAISY_HOME to be set.

Please see the wrapper documentation on how to regenerate the service wrapper scripts.

2.11.1.11 Start the servers

Make sure the DAISY_HOME environment variable points to the new Daisy 2.1 directory (you might want to rename the old directory to avoid it is still used by accident).

2.11.1.12 Update the default repository schema

There are some new schema types, therefore update the repository schema by running the daisy-wiki-init script:

[Windows]
cd <DAISY_HOME>\install
daisy-wiki-init

[Linux]
cd <DAISY_HOME>/install
./daisy-wiki-init

2.12 2.1-RC to 2.1 upgrade

2.12.1 Changes since 2.1-RC

2.12.2 Upgrade instructions

These are the upgrade instructions for when you have currently Daisy 2.1-RC installed.

This release requires no special upgrade steps, besides putting the new Daisy distribution in place.

In case you have problems during the upgrade or notice errors or shortcomings in the instructions below, please let us know on the Daisy mailing list.

2.12.2.1 Daisy installation review

In case you're not very familiar with Daisy, it is helpful to identify the main parts involved. The following picture illustrates these.

There is the application directory, which is simply the extracted Daisy download, and doesn't contain any data (to be safe don't remove it yet though).

Next to this, there are 3 locations where data (and configuration) is stored: the relational database (MySQL), the repository data directory, and the wiki data directory. The Daisy repository and the Daisy Wiki are two independent applications, therefore each has its own data directory.

The text between the angle brackets (< and >) is the way we will refer to these directories further on in this document. Note that <DAISY_HOME> is the new extracted download (see later on), not the old one.

2.12.2.2 Stop your existing Daisy

Stop your existing Daisy, both the repository server and the Daisy Wiki.

2.12.2.3 Download and extract Daisy 2.1

If not done already, download Daisy 2.1 from the distribution area (Sourceforge). For Windows, download the zip or autoextract.exe (not the installer!). For Unix-based systems, the .tar.gz is recommended. The difference is that the .zip contains text files with DOS line endings, while the .tar.gz contains text files with unix line endings. When using non-Linux unixes such as Solaris, be sure to use GNU tar to extract the archive.

Extract the download at a location of your choice. Extract it next to your existing Daisy installation, do not copy it over your existing installation.

2.12.2.4 Update environment variables

Make sure the DAISY_HOME environment variable points to the just-extracted Daisy 2.1 directory.

Note that when you start/stop Daisy using the wrapper scripts, you don't need to set DAISY_HOME, though you do need to update or re-generate the service wrapper configuration (see next section).

How this is done depends a bit on your system and personal preferences:

2.12.2.5 Start Daisy

Start Daisy using the normal scripts or the wrapper scripts.

3 Source Code

Sources can be obtained through SVN. Instructions for setting up a development environment with Daisy (which is slightly different from using the packaged version) are included in the README.txt's in the source tree. For anonymous, read-only access to Daisy SVN, use the following command:

svn co http://svn.cocoondev.org/repos/daisy/trunk/daisy

This will give the latest development code (the "trunk"). To get the source code of a specific release, use a command like this:

svn co http://svn.cocoondev.org/repos/daisy/tags/RELEASE_1_3_1 daisy

See also the existing tags.

No authentication is required for anonymous access. If you're behind a (transparent) proxy, you might want to verify whether your proxy supports the extended HTTP WebDAV methods.

3.1 Daisy Build System

We should consider removing this document, Maven is common enough these days.

The build system used by Daisy is Maven, an Apache project.

3.1.1 Maven intro

What follows is the very-very-quick Maven intro, for those not familiar with Maven.

Unlike Ant, where you tell how your code should be build, in Maven you simply tell what directory contains your code, and what the dependencies are (i.e. what other jars it depends on), and it will build your code. This information is stored in the project.xml files that you'll see across the Daisy source tree. There are a lot of them, since Daisy is actually composed of a whole lot of mini-projects, whereby some of these projects depend on one or more of the others.

An important concept of Maven is the repository, which is a repository of so-called artifacts, usually jar files. An artifact in the repository is identified uniquely by a group id and an id (both are simply descriptive names). Declaring the dependencies of a project is done by specifying repository references, thus for each dependency you specify the group id and id of the dependency. An example dependency declaration, as defined in a project.xml file:

    <dependency>
      <groupId>lucene</groupId>
      <artifactId>lucene</artifactId>
      <version>1.3</version>
    </dependency>

So where does the repository physically exist? Well, there can be many repositories. The most important public one is on ibiblio:

http://www.ibiblio.org/maven/

The repository is simply accessed using HTTP, so you can take your browser and surf to that URL. A repository like the one on ibiblio is called a remote repository. After initially downloading an artifact from the remote repository, it is installed in your local repository, which is by default located in ~/.maven/repository.

When you build a project, the result of the build is usually a jar file. Maven will install this jar file in your local repository, so that when you build another project that depends on this jar file, it can be found over there. When searching a dependency, Maven always checks the local repository first, and then goes off checking remote repositories. Which remote repositories are searched is of course configurable.

I should also tell you something about the build.properties and project.properties files. Both files contain properties for the build and configuration for Maven. The difference is that the project.properties files are committed to the source repository (SVN in Daisy's case), while the build.properties files are intended for local customisations (thus on your computer). So if you see something in a project.properties file that you'd like to change, don't change it over there (as this will otherwise show up as a modified file when doing svn status), but do it in the build.properties file. The build.properties file thus has a higher precedence than the project.properties file.

There is a lot more to tell about Maven, such as that it is actually composed of a whole lot of plugins, that there is something like "goals" to execute, that there is the possibility to have a maven.xml file to define custom goals with custom build instructions, and that all artifacts are also versioned. But I'll let you explore the Maven documentation to learn about that.

3.1.2 Extra dependencies

Daisy has some dependencies on artifacts (remember, jar files) that are not available in the public ibiblio repository. We make these available in our own repository on http://cocoondev.org/repository/.

3.1.3 Building Daisy

Instructions for building Daisy can be found in the README.txt file in the root of the Daisy source tree. At some point it will tell you to execute maven in the root of the source tree, which will actually build all the little mini-projects of which Daisy consists, in the correct sequence so that all dependencies are satisfied.

4 Repository server

The repository server is the core of Daisy. It provides the pure content management functionality without GUI (graphical user interface).

The main purpose of the repository is managing documents.

The repository server consists of a core and some non-essential extension components that add additional functionality. The repository can be accessed by a variety of client applications (such as web applications, command-line tools, desktop applications, ...) through its programming interfaces.

4.1 Documents

4.1.1 Introduction

The purpose of the Daisy Repository Server is managing documents. The main content of a document is contained in its so-called parts and fields. Parts contain arbitrary binary data (e.g. an XML document, a PDF file, an image). Fields contain simple information of a certain data type (string, date, decimal, ...).

The diagram below gives an overview of the document structure, this is explained in more detail below.

4.1.2 No hierarchy

Daisy has no folders or directories like a filesystem, all documents are stored in one big bag. When saving a document, you only have to choose a name for it (which acts in fact as the title of the document), and this name is not even required to be unique (see below). Documents are retrieved by searching or browsing. Front-end applications like the Daisy Wiki allow to define multiple hierarchical views on the same set of repository documents.

4.1.3 Documents & document variants

A document can exist in multiple variants, e.g. in multiple languages. A document in itself does not consist of much, most of the data is contained in the document variants. From another point of view (which closer matches the implementation), one could say that the repository server actually manages document variants, which happen to share a few properties (most notably their identity) through the concept of a document.

A document has always at least one document variant, a document cannot exist by itself without variants.

A document is identified uniquely by its ID, a document variant is identified by the triple {document ID, branch, language}.

If you are not interested in using variants, you can mostly ignore them. In that case each document will always be associated with exactly one document variant. Therefore, often when we speak about a document in Daisy, we implicitly mean "a certain variant of a document" (a "document variant"). In a practical working environment like the Daisy Wiki, the branch and language which identify the particular variant of the document are usually a given (Daisy Wiki: configured per site), and you'll only work with document IDs, so it is as if the existence of variants is transparent.

Refer to the diagram above to see if a certain aspect applies to a document, a document variant, or a version of a document variant.

For more details on this topic, see variants.

4.1.4 Document properties

4.1.4.1 ID

When a document is saved for the first time, it is assigned a unique ID. The ID is the combination of a sequence counter and the repository namespace. If the repository namespace is FOO, then the first document will get ID 1-FOO, the second 2-FOO, and so on. The ID of a document never changes.

4.1.4.2 Owner

The owner of a document is a person who is always able to access (read/write) the document, regardless of what the ACL specifies. The owner is initially the creator of the document, but can be changed afterwards.

4.1.4.3 Created

The date and time when the document was created. This value never changes.

4.1.4.4 Last Modified and Last Modifier

Each time a document is saved, the user performing the save operation is stored as the last modifier, and the date and time of the save operation as the "last modified" timestamp.

Note that each document variant has their own last modified and last modifier properties, which are usually more interesting: the last modified and modifier of the document are only updated when some of the shared document properties change.

4.1.5 Document variant properties

4.1.5.1 Versions

A document consists of versioned and non-versioned data. Versioned data means that each time the document is saved (and some of the versioned aspects of the document changed), a new version will be stored, so that the older state of the data can still be viewed afterwards.

It hence provides a history of who made what changes at what time. It also allows to work on newer versions of a document while an older version stays the live version, as explained in version state.

4.1.5.2 Versioned Content

The versioned content of a document consists of the following:

So if any changes are made to any of these, and the document is stored, a new version is created.

4.1.5.2.1 Version ID

Each version has an ID, which is simply a numeric sequence number: the first version has number 1, the next number 2, and so on.

4.1.5.2.2 Document Name

The name of a document is required (it cannot be empty). The name is not required to be unique. Thus there can be multiple documents with the same name. The ID of the document is its unique identification.

The name is usually rendered as the title of the document.

4.1.5.2.3 Parts

A part contains arbitrary binary data. "Binary data" simply means that it can be any sort of information, such as plain text, XML or HTML, an image, a PDF or OpenOffice document.

In contrast with many repositories or file systems, a Daisy document can contain multiple parts. This allows to store different types of data in one document (e.g. text and an image), and makes these parts separately retrievable.

For example, one could have a document with a part containing an abstract and a part containing the main text. It is then very easy and efficient to show a page with the abstracts of a set of document.

As another example, a document for an image could contain a part with the rendered image (e.g. as PNG), a part with a thumbnail image and a part with the source image file (e.g. a PhotoShop or SVG file).

The parts that can be added to a document are controlled by its document type.

Each part:

4.1.5.2.4 Fields

Fields contain simple information of a certain data type (string, date, decimal, ...). Depending on how you look at it, fields could be metadata about the data stored in the parts, or can be data by themselves.

One of the data types supported for fields is link, which allows the field to contain a link to another Daisy document. Link-type fields are useful for defining structured links (associations) between documents. For example, you could have documents describing wines, and other documents describing regions. Using a link-type field you can connect a wine to a region. By having this association in a field, it is easy to perform searches such as all wines associated with a certain region. The Daisy Wiki allows, by means of the Publisher, to aggregate data from linked documents when displaying a document, which combined with some custom styling allows to do very interesting things.

Fields can be multi-valued. The order of the values in a multi-value field is maintained. The same value can appear more than once.

A field can be hierarchical, meaning that its value represents a hierarchical path. A field can be multi-value and hierarchical at the same time.

The fields that can be added to a certain document are specified by its document type.

Each field:

A document can contain links in the content of parts (for example, an <a> element in HTML) or in link-type fields. Next to this a document can have a number of so-called out-of-line links. These are links stored separately from the content. Each link consists of a title and a target (some URL). These links are usually rendered at the bottom of a page in as a bulleted list.

Out-of-line links are useful in case you want to link to related documents (or any URL) and either don't want or can't (e.g. in case of non-HTML content) link to them from the content of a part.

4.1.5.2.6 Version state & the live version

Each version can have a state indicating whether it is a draft version (i.e. you started editing the document but are not finished yet, in other words the changes should not yet be published), or a publishable version. The most recent version having the state 'publish' becomes the live version. The live version is the version that is typically shown by default to the user. It is also the version whose data is indexed in the full-text index, and whose properties are used by default when querying. The ACL enables to restrict access for users to only the live versions of documents.

4.1.5.3 Non-versioned properties

4.1.5.3.1 Document type

Each document is associated with a document type, describing the parts and fields the document can contain. See repository schema for more information on document types.

4.1.5.3.2 Collections and collection membership

Collections are sets of documents. A document can belong to zero, one or more collections, thus collections can overlap. A collection is simply a way to combine some documents in order to do something with them or treat them in some special way. In other words, they are a sort of built-in (always present) metadata to identify a set of documents.

Collections themselves can be created or deleted only by Administrators (in the Daisy Wiki, this is done in the administration interface). Deleting a collection does not delete the documents in it. You can limit who can put documents in a collection by ACL rules.

4.1.5.3.3 Custom fields

Custom fields are arbitrary name-value pairs assigned to a document. The name and value are both strings. In contrast with the earlier-mentioned fields that are part of the document type, these fields are non-versioned. This makes it possible to stick tags to documents without causing a new version to be created, and without formally defining a field type.

4.1.5.3.4 Private

A document marked as private can only be read (and written) by its owner.

While the global access control system of Daisy makes it easy to centrally handle access control for sets of documents, sometimes it could be useful to simply say "I want nobody else to see this (for now)". This can be done by enabling the private flag. The document will then not be accessible for others, and also won't turn up in search results done by others. The private flag can be set on or off at any time, by the owner or by an Administrator.

There is however one big exception: Administrators can always access all documents, and thus will be able to read your "private" documents. The content is not encrypted.

4.1.5.3.5 Retired

If a document variant is no longer needed, because its content is outdated, replaced by others, or whatever, you can mark the document variant as retired. This makes the document variant virtually deleted. It won't show up in search results anymore.

The retired flag can be set on or off at any time, retiring is not a one-time operation.

4.1.5.3.6 Lock

A lock can be taken on a document variant to make sure nobody else edits the document variant while you're working on it.

Daisy automatically performs so-called optimistic locking, this means that if person A starts editing the document, and then person B starts editing the document, and then person A saves the document, and then person B tries to save the document, this last operation will fail because the document has changed since the time person B loaded it. This mechanism is always enabled, it is not needed to take an explicit lock.

A lock can then be taken to make others aware that you are editing the document. A lock can be of two types: an exclusive lock or a warn lock. An exclusive lock is pretty much as its name implies: it is a lock exclusively for the user who requested it, and avoids that any one else will be able to save the document until you release the lock. A warn lock isn't really a lock, it is just an informational mechanism to let others know that someone else also started to edit the document, but it doesn't enforce anything. Anyone else can still at any time save the document or replace the lock with their own.

A lock can optionally have a certain duration, if the duration is expired, the lock is automatically removed.

For example, the Daisy Wiki application by default uses exclusive locks with a duration of 15 minutes, and automatically extends them as longs as the user continues editing.

A lock can be removed either by the person who created it, or by an Administrator.

4.1.5.3.7 Last Modified and Last Modifier

Each time a document is saved, the user performing the save operation is stored as the last modifier, and the date and time of the save operation as the "last modified" timestamp. This will often fall together with the Created/Creator fields of the last version, but not necessarily so: if only non-versioned properties are changed, no new version will be created.

4.2 Repository schema

4.2.1 Overview

The repository schema controls the structure of documents.

The repository schema defines part types, field types and document types. A document type is a combination of zero or more part types and zero or more field types. Part and field types are defined as independent entities, meaning that the same part and field types can be reused across different document types. The diagram below shows the structure and relation of all these entities.

4.2.1.1 Common aspects of document, part and field types

Let us first look at the things document, part and field types have in common. Their primary, unchangeable identifier is a numeric ID, though they also have a unique name (which can be changed after creation), which you will likely prefer to use.

Next to the name, they can be optionally assigned a localized label and a description. Localized means that a different label and description can be given for different locales. A locale can be a language, language-country, or language-country-variant specification. For example, a label entered for the locale "fr-BE " would mean it is in French, and specifically for Belgium. The labels and descriptions are retrieved using a fallback system. For example, if the user's locale is "fr-BE", the system will first check if a label is available for "fr-BE", if not found it will check for "fr", and finally for the empty locale "". Thus if you want to provide labels and descriptions but are not interested in localization, you can simply enter them for the empty locale.

Document, part and field types cannot be deleted as long as they are still in use in the repository. Once a document has been created that uses one of these types, the type can thus not be deleted anymore (unless the documents using them are deleted). However, it is possible to mark a type as deprecated to indicate it should not be used anymore. This deprecation flag is purely informational, the system simply stores it.

4.2.1.2 Document types

A document type combines a number of part types and field types. The association with the part and field types, in the diagram shown as the "Part Type Use" and "Field Type Use", are not stand-alone entities but part of the document type.

The associations have a property to indicate whether or not the parts and fields are required to have a value.

The associations also have a property called 'editable'. This property is a hint towards the document editing GUI that the part or field should not be editable. This is just a GUI hint, not an access control restriction. This can for example be useful if the values of certain fields or parts are assigned by an automated process.

4.2.1.3 Part types

A part type defines a part that can be added to a document.

4.2.1.3.1 Mime-type

A part type allows to restrict which types of data (thus which mime-types) are stored in the part, but this is not required. This restriction is done by specifying a list of allowed mime types.

4.2.1.3.2 The Daisy HTML flag

A part type has a flag indicating whether the part contains "Daisy HTML". Daisy HTML is basically HTML formatted as well-formed XML (with element and attribute names lowercased). It is not the same as XHTML, because the elements are not in the XHTML namespace. If the "Daisy HTML" flag is set to true, the mime-type should be limited to text/xml. For the repository server, the Daisy-HTML flag on the part type has little meaning. Currently it serves only to enable the creation of document summaries (which might even be replaced with a more flexible mechanism in the future). The Daisy Wiki front end application will show a wysiwyg editor for Daisy HTML parts, and display the content of such parts inline.

4.2.1.3.3 Link extraction

For each part type a link extractor can be defined to extract links from the content contained in the part. The most common link extractor is the "daisy-html" one, which will extract links from the href attribute of the <a> element, the src attribute of the <img> element, and the character content of <p class="include">. The format of the links is:

daisy:<document id>
or
daisy:<document id>@<branch id or name>:<language id or name>:<version id>#fragment_id

Links that don't conform to this form will be ignored. The <version id> can take the special value "LAST" (case insensitive). A link without a version specification denotes a link to the live version of the document. The branch, language and version and fragment ID parts are all optional. For example, daisy:15@:nl is a link to the Dutch version of document 15.

The repository server also has link extractors for extracting links from navigation and book definition documents.

4.2.1.4 Field types

A field type defines a field that can be added to a document.

4.2.1.4.1 Value Type

The most important thing a field type tells about a field is its value type. A value type identifies the kind of data that can be stored in a field, the available value types are listed in the table below, together with their matching Java class.

Value type name

Corresponding Java class

string

java.lang.String

date

java.util.Date

datetime

java.util.Date

long

java.lang.Long

double

java.lang.Double

decimal

java.math.BigDecimal

boolean

java.lang.Boolean

link

org.outerj.daisy.repository.VariantKey

The link type is somewhat special: it defines a link to another document variant. Its value is thus a triple (document ID, branch ID, language ID). The branch ID and language ID are optional (value -1 in the VariantKey object) to denote they should default to the same as the containing document (in other words, the branch and language are relative to the document). The branch and language will usually be unspecified, since this allows copying content between the variants while the links stay relative to the actual variant.

4.2.1.4.2 Multi-value

The multi-value property of a field type indicates whether the fields of that type can have multiple values. All the values of a multi-value field should be of the same value type.

A multi-value field can have more than once the same value, and the order of values of a multi-value field is maintained. Thus the values of a multi-value field form an ordered list.

In the Java API, a multi-value value is represented as an Object[] array, in which the entries are objects of the type corresponding to the field's value type (e.g. an array of String's, or an array of Long's).

4.2.1.4.3 Hierarchical

The hierarchical property of a field type indicates that the value of the fields of that type is a hierarchical path (a path in some hierarchy). A path is often represented as a slash-separated string, e.g. Animals/Four-legged/Dogs.

Hierarchical fields are technically quite similar to multi-value fields, because a hierarchical path is also an ordered set of values. It is however possible for a field type to be both hierarchical and multi-value at the same time.

In the Java API, a hierarchical value is represented by a HierarchyPath object:

org.outerj.daisy.repository.HierarchyPath

A multi-value hierarchical value is an array (Object[]) of HierarchyPath objects.

4.2.1.4.4 Selection Lists

It is possible to define a selection list for a field type. This is a list of possible values that an end user can choose from when completing the field. There are multiple available selection lists types:

The hierarchical selection lists can be used both for hierarchical and non-hierarchical fields. For hierarchical fields, to whole path leading to the selected node is stored, for non-hierarchical fields only the selected node.

4.2.1.4.5 ACL allowed flag

In the access control system, it is possible to define access rules for documents by using an expression to select the documents to which the access rules apply. In these expressions, it is also possible to check the value of fields, but only of fields whose field types' ACL allowed flag is set to true. The ACL allowed flag also enables the front-end to warn that changing the value of that particular field can influence the access control checks.

4.2.1.4.6 Size hint

A field can have a size hint, this is simply an integer number. This information is used by the front end to display an input field of an appropriate width. The repository server doesn't associate any further meaning to it, it doesn't cause any validation to happen, nor does it specify the unit of the width (most likely to be "number of characters").

4.2.1.5 Document and document type association, how changes to document types are handled

Upon creation of a document, a document type must be supplied. When saving a document, the repository will check that the document conforms to its document type. Thus it will check that all required fields and parts are present, and that there are no parts and fields in the document that are not allowed by the document type.

The document type of a document can be changed at any time. This is useful if you start out with a generic document type but later want to switch to a more specialized document type.

The definition of a document type can be changed at any time. Part and field types can be added or removed from it, or can be made required. A logical question that pops up is what happens to existing documents in the repository that use the changed document type. The answer is basically "nothing". If for example a required field is added to a document type, then the next time a document of that type is edited, it will fail to save unless a value for the field is specified. The newly saved version of the document will then conform to the new state of the document type. Older versions of the document will remain unchanged however. When saving a document, it is also possible to supply an option that tells not to do the document type conformance check.

So basically the document type system doesn't give any guarantees about the structure of the documents in the repository, but rather hints at how the documents should be structured and interpreted.

See also the FAQ entry How do I change the document type of a set of documents?

4.3 Variants

4.3.1 Introduction

The variants feature of Daisy allows to have multiple alternatives of a document stored in one logical document, thus identified by one unique ID.

Daisy allows to have variants among two axes:

For example, if there would not be a variants feature, and you had the same content in different languages, for each of these languages you would need to create a different document, thus with a different ID.

Language variants are quite obvious, but you may wonder what branches are. The purpose of branches is to have multiple parallel editable versions of the same content. As an example, take the Daisy documentation. Between major Daisy releases there might be quite some changes to the documentation. However, while creating the documentation of e.g. Daisy 1.3, we still want the ability to update the documentation of Daisy 1.2. Sure, this could be solved by duplicating all documentation documents for each new release, but then the identity of these documents would be lost since they get new IDs assigned, and the relationship between the documents in different releases would be lost.

4.3.2 Defining variants

By default, Daisy predefines one branch and one language variant: the branch main and the language default.

You can yourself define other ones, in the Daisy Wiki you can do this via the administration screens.

The definition of a branch or language consists of a numeric ID (assigned by the repository server), a name and optionally a description. Internally, the ID is used, but towards the user mostly the name is shown.

The built-in main branch and default language each have as ID 1.

Once a branch and/or language is defined, you can create new document variants using them.

Defining the branches and languages is something that can only be done by users who have the Administrator role, but adding variants to documents (which is almost the same as creating documents) can of course be done by any user, as far as the ACL allows the user to do so.

Deleting a branch or language definition is only possible when there are no more document variants for that branch or language. You can easily delete all document variants for a certain branch or language using the Document Task Manager, similarly to what is described further on for creating a variant across a set of documents.

4.3.3 Creating a variant on a document

When adding a new variant to a document, this can be done in two ways:

  1. from scratch
  2. based on the content of (a certain version of) an existing variant

When you opt for the second option (which is mostly done when creating branch-variants) then the (branch,language,version)-triple from which the content is taken will be stored as part of the new variant, so that later on you can see from where this variant "branched" (in the Daisy Wiki, this information is shown on the version list page).

In the Daisy Wiki, there is an "Add Variant" action that allows to add a new variant to a document.

4.3.4 Searching for non-existing variants

When translating a site, it can be useful to search which documents are not yet translated in a certain language. Similarly, it can be useful to see which documents exist on one branch but not on another. For this purpose, the query language provides a function called DoesNotHaveVariant(branch, language).

For example, to search on the Daisy site for all documents that have been added in the documentation of version 1.3 compared to 1.2, you can use the following query:

select id, name
  where
   InCollection('daisydocs')
   and branch = 'daisydocs-1_3' and language = 'en'
   and DoesNotHaveVariant('daisydocs-1_2', 'en')

4.3.5 Queries embedded in documents

When using queries embedded in documents together with variants, usually you will want to limit the query results to variants with the same branch and language as the one containing the query. You could specify these explicitly, as in:

select id, name where <conditions> and branch='my_branch' and language='my_lang'

However, this means that you will need to adjust these queries when adding new variants to the document. Especially if you are adding a certain branch to a set of documents, this is not something you want to do. Therefore, it should be possible to refer to the branch and language of the containing document. This can be done as follows:

select id, name where <conditions> and branchId = ContextDoc(branchId) and languageId = ContextDoc(languageId)

4.3.6 Creating a variant across a set of documents

When using branches, you will often want to add a variant for that branch to a set of documents (in other words: create a branch across a set of documents). To avoid the need to do this one-by-one for each document, Daisy has a "Document Task Manager" which allows the execution of a certain task on a set of documents. And that task could for example be "adding a new variant".

The Document Task Manager is covered in a separate section, here we will just focus on how to use it to create a new variant.

Before using the Document Task Manager, be sure you have defined the new branch (or language) using the administration screens.

In the Daisy Wiki, the Document Task Manager is accessed via the drop-down User-menu (in the main navigation bar). Select the option to create a new task. You are then first presented with a screen where you need to specify the documents (document variants actually) with which you want to do something. As you can see, it is possible to add documents using queries. For example, for the Daisy site, when we want to create a branch starting from the Daisy 1.2 documentation, we would use a query like:

select id, name where InCollection('daisydocs') and branch = 'daisydocs-1_2' and language = 'en'

Once you selected the documents, press Next to go to the next page where the action to be performed on the documents is specified. For Type of task choose Simple Actions. Then press the Add button to add a new action. Change the type of the action to Create Variant (if necessary), and specify the branch and language you want to create. Finally press start to start the task. You can then follow up on the progress of this operation, and check if it finished successfully for all documents.

4.4 Repository namespaces

Each Daisy repository (since Daisy 2.0) has a namespace. The documents created in that repository will by default belong to that namespace. The ID of a document is the combination of a numeric sequence and the namespace, for example:

2583-AWAN

Each repository server is responsible for maintaining the sequence number for the documents of their namespace. When a document is created with a foreign namespace (a namespace from another repository server), the sequence ID needs to be supplied, since it is assumed that another repository server is responsible for that namespace. On the other hand, when a document is created in the namespace of the repository server itself, then a sequence ID cannot be supplied, as the repository server itself is then responsible for assigning the sequence ID.

A Daisy repository server doesn't really care whether foreign namespaces are really associated with other repository servers. The namespace could also come from an external application that creates documents (and maintains its own sequence counter), or it could come from a manually created 'export' that is imported using the import tool. So for foreign namespaces, it just assumes someone else is responsible for assigning unique sequence numbers.

4.4.1 Namespace name

A namespace can be up to 200 characters and contain the characters a-z, A-Z, 0-9 and _ (underscore). The dot character is not allowed to avoid confusion with file name extensions (as the document ID will often be used in URLs), and the dash character is not allowed to avoid confusion with the separator between the sequence number and the namespace.

The namespace name is typically a short string. This approach has been choosen to keep the document IDs short and readable. But as a consequence, it requires some care to avoid conflicting namespaces within an organisation, and between organisations there's no control whatsoever.

If you want to avoid the possibilities of conflicts and don't care about the readability, it is always possible to use a longer string. For example, a registered domain name or a random generated string (GUID). However, in general we would recommend to stick with a short name.

The repository server installation tool gives some recommendations on namespace naming.

4.4.2 Namespace purpose

Namespaces have little purpose as long as documents are not exchanged between repositories. The main purpose of namespaces it to allow import/export (replication) of documents between repositories. If there wouldn't be namespacess, the document IDs between the repositories would not be unique and hence there would be conflicts. For example, in both repositories there might be a document with ID 55, though these would be different documents. It would of course be possible to assign new IDs to documents upon import, but then the identity of the original document would be lost, which would make subsequent 'update' imports difficult, and also requires updating links in all document content (which would mean the import tool has to understand the document formats).

4.4.3 Namespace fingerprints

For each namespace, there is an associated namespace fingerprint. Since namespaces will usually be short strings, and since there might be people who choose the same namespace or simply used the proposed default (DSY), some additional verification is needed to assure that two namespaces are really the same. For this purpose, each namespace name is associated with a fingerprint. The namespace fingerprint is typically a longer random generated string.

The repository server keeps a table of the namespaces used in the repository and their corresponding fingerprints (these are the namespaces that are said to be registered in the repository). The registered namespace are viewable through the administration screen of the Daisy Wiki, which also allows unregistering unused namespaces.

For export/import, namespace fingerprint information is included in the export, so that it can be verified upon import.

4.4.4 Namespacing of non-document entities

At the time of this writing, only documents are namespaced. Thus other entities, like the document, part and field types are not in a namespace. The export/import tools assume that if the name corresponds, they are the same.

4.5 Document Comments

This section is about document comments: comments that can be added to Daisy documents. More precisely, they are actually added to document variants, thus each variant of a document has its own comments.

4.5.1 Comment features

The current Daisy comments system is rather simple (text-only comments, no editing after creation, no threading) but nonetheless very useful.

4.5.1.1 Comment visibility

Each comment has a certain visibility:

4.5.1.2 Creation of comments

Everyone who has read access to a document can add comments to it. Editors-only comments can however only be created by users with write access to the document.

4.5.1.3 Deletion of comments

Comments can be removed from a document by the users who have write access to the document (this includes users acting in the Administrator role). Private comments can be deleted by its creator, independent of whether that user has write access to the document the comment belongs too.

When a document is deleted, all its associated comments are removed too, including private ones that the deleter of the document may not be aware of.

4.5.2 Daisy Wiki specific notes

4.5.2.1 Guest user cannot create comments

The guest user, though it is for the repository server an ordinary user like any other, is not allowed to create comments via the Daisy Wiki. This means that to create comments, users should first log in.

4.5.2.2 'My Comments' page

Users can get a list of all the private comments they added to documents via a "My Comments" page (accessible via the drop-down menu behind the user name).

4.6 Query Language

4.6.1 Introduction

The Daisy Query Language can be used to search for documents (more precisely, document variants). In the Daisy Wiki, queries can be used in various places:

The implementation of various Daisy Wiki features is also based on queries, such as the recent changes page or the referrers page. And of course it is possible to execute queries from your own applications, using the HTTP interface or Java API.

The query language is a somewhat SQL-like language that allows to search on various document properties (including the fields), fulltext on the part content, or a combination of those. The sort order of the results can also be defined. The resulting document list is filtered to only include documents to which the user has at least read-live access.

An example query, searching all documents in a collection call "mycollection":

select id, name where InCollection('mycollection') order by name

Internally, non-fulltext queries are translated to SQL and executed on the relational database while fulltext queries are executed by Jakarta Lucene.

Although the query language is somewhat SQL-like, it hides the complexity of the actual SQL-queries that are performed by the repository server on the relational database, which can quickly grow quite complex.

Note: every time in this document when we talk about "searching documents", this is equivalent to "searching document variants". The result of query is a set of document variants, i.e. each member of the result set is identified by a triple (document ID, branch, language).

4.6.2 Query Language

4.6.2.1 General structure of a query

select
  ...
where
  ...
order by
  ...
limit x
option
  ...

The select and where parts are required, the rest is optional. Whitespace is of no importance.

4.6.2.2 The select part

The select part should list one or more value expressions, separated by commas. A value expression can be an identifier, a literal or a function call. This is described in more detail further on.

4.6.2.3 The where part

The where part should contain a predicate expression, thus an expression which tests the value of value expressions using operators, or uses some built-in conditions.

Besides the operators listed in the table below, the operations AND and OR are supported, and parentheses can be used for grouping.

4.6.2.3.1 Operators & data types

string

long

double

decimal

date

datetime

boolean

=

X

X

X

X

X

X

X

!=

X

X

X

X

X

X

X

<

X

X

X

X

X

X

>

X

X

X

X

X

X

<=

X

X

X

X

X

X

>=

X

X

X

X

X

X

[NOT] LIKE

X

[NOT] BETWEEN

X

X

X

X

X

X

[NOT] IN

X

X

X

X

X

X

IS [NOT] NULL

X

X

X

X

X

X

X

Wildcards for LIKE are _ and %, escape using \_ and \%.

All keywords such as AND, LIKE, BETWEEN, ... can be written in either uppercase or lowercase (but not mixed case).

If these operators are used on multi-value fields, they return true if at least one of the values of the multi-value field satisfies. See further on for a set of conditions specifically for multi-value fields.

Normally the comparison operators work on values of the same type, though there is some relaxation for compatible types, e.g. it is possible to compare between all numeric types, and between the date and datetime types.

4.6.2.4 Value expressions

A value expression is:

A function call usually has the following form:

functionName(arg1, arg2, ...)

However, for the basic mathematical functions (addition, subtraction, multiplication and division) "infix" notation is used instead, using the symbols +, -, * and /. Parentheses can be used to influence the order of the operations.

4.6.2.5 Identifiers

The table below lists the available identifiers.

Some notes:

name

searchable

datatype

version dependent

remarks

id

yes

string

no

namespace

yes

string

no

The namespace part of the document ID

name

yes

string

yes

branch

yes

symbolic

no

branchId

yes

long

no

language

yes

symbolic

no

languageId

yes

long

no

link

yes

link

no

The current document variant as a link. This is useful for comparison with link type fields. For example $someLinkField = link to find documents which link to themselves in a certain field, or $someLinkField = ContextDoc(link) to find documents which link to the context document.

documentType

yes

symbolic

no

versionId

yes

long

yes

ID of the live version, or if the query option search_last_version is specified, of the last version

creationTime

yes

datetime

no

ownerId

yes

long

no

ownerLogin

yes

symbolic

no

ownerName

no

string

no

summary

no

string

no

always of last published version

retired

yes

boolean

no

private

yes

boolean

no

lastModified

yes

datetime

no

lastModifierId

yes

long

no

lastModifierLogin

yes

symbolic

no

lastModifierName

no

string

no

variantLastModified

yes

datetime

no

variantLastModifierId

yes

long

no

variantLastModifierLogin

yes

symbolic

no

variantLastModifierName

yes

string

no

%partTypeName.mimeType

yes

string

yes

%partTypeName.size

yes

long

yes

%partTypeName.content

no

xml

yes

only works for part types for which the flag 'daisy html' is set to true, and additionally the actual part must have the mime type 'text/xml'

versionCreationTime

yes

datetime

yes

versionCreatorId

yes

long

yes

versionCreatorLogin

yes

symbolic

yes

versionCreatorName

yes

string

yes

versionState

yes

symbolic

yes

'draft' or 'publish'

totalSizeOfParts

yes

long

yes

sum of the size of all parts in document

versionStateLastModified

yes

datetime

yes

lockType

yes

symbolic

no

'pessimistic' or 'warn'

lockTimeAcquired

yes

datetime

no

lockDuration

yes

long

no

(in milliseconds)

lockOwnerId

yes

long

no

lockOwnerLogin

yes

symbolic

no

lockOwnerName

no

string

no

collections

yes

symbolic

no

The collections (the names of the collections) the document belongs too. Behaves the same as a multi-value field with respect to applicable search conditions.

collections.valueCount

yes

symbolic

no

The number of collections a document belongs too.

$fieldTypeName

yes

yes

datatype depends on field type

$fieldTypeName.valueCount

yes

long

yes

Useful for multi-value fields. Searching for a value count of 0 does not work, use the "is null" condition instead.

$fieldTypeName.documentId

yes

string

yes

These special field sub-identifiers are only supported on fields of the type "link". For link field types, the $fieldTypeName identifier checks on the document ID, while these identifiers can be used to check on the branch and language.

$fieldTypeName.branch

yes

symbolic

yes

$fieldTypeName.branchId

yes

long

yes

$fieldTypeName.language

yes

symbolic

yes

$fieldTypeName.languageId

yes

long

yes

$fieldTypeName

.namespace

yes

string

yes

#customFieldName

yes

string

no

score

no

double

no

The score of a document after doing a full text search.  This score ranges from 0-1.  When this identifier is used without the FullText() function it will just return 0.

4.6.2.5.1 Addressing components of multivalue and hierarchical field identifiers

For multivalue and hierarchical field identifiers, an index-notation is supported using square brackets.

For multivalue fields:

$SomeField[index]

For multivalue hierarchical fields:

$SomeField[index][index]

For non-multivalue hierarchical fields, or if you only want to specify an index for the hierarchical value, you can use:

$SomeField[*][index]

The index is 1-based. You can address elements starting from the end by using negative indexes, e.g. -1 for the last element in a multivalue or hierarchy path.

In case you are using a sub-field identifier, it should be put after the square brackets:

$SomeField[index].documentId

Specifying an out-of-range index doesn't give an error, but simply finds/returns nothing.

4.6.2.6 Literals

4.6.2.6.1 String literals

Strings (text) should be put between single quotes, the single quote is escaped by doubling it, for example:

'''t is mooi weer vandaag'
4.6.2.6.2 Numeric literals

These consists of digits (0-9), the decimal separator is a dot (.).

Numeric literals can be put between single quotes like strings, but it is not required to do so.

4.6.2.6.3 Date & datetime literals

Date format: 'YYYY-MM-DD'

Datetime format: 'YYYY-MM-DD HH:MM:SS'

4.6.2.6.4 Link literals

When searching on fields of type "link", the link should be specified as:

'daisy:docid'               (assumes branch main and language default)
'daisy:docid@branch'        (assumes language default)
'daisy:docid@branch:lang'   (branch can be left blank which defaults to main branch)

Branch and language can be specified either by name or ID.

So a search condition could be for example:

$someLinkField = 'daisy:35'

4.6.2.7 Special conditions for multi-value fields

$fieldName has all (value1, value2, value3, ...)

Tests that the multi-value field has all the specified values (and possibly more).

$fieldName has exactly (value1, value2, value3, ...)

Tests that the multi-value field has all the specified values, and none more. The order is not important.

$fieldName has some (value1, value2, value3, ...)
or
$fieldName has any (value1, value2, value3, ...)

has some and has any are synonyms. They test that the multi-value field has at least one of the specified values.

$fieldName has none (value1, value2, value3, ...)

Tests that the multi-value field has none of the specified values.

In addition to these conditions, you can use is null and is not null to check if a document has a certain (multi-value) field. The special sub-identifier $fieldName.valueCount can be used to check the number of values a multi-value field has.

4.6.2.8 Searching on hierarchical fields

4.6.2.8.1 matchesPath

For searching on hierarchical fields, a special matchesPath condition is available. It takes as argument an expression in which the elements of the hierarchical path are separated by a slash. For example, a basic usage is:

$fieldName matchesPath('/A/B/C')
$fieldName matchesPath('A/B/C')  -> the initial slash is optional

This would return all documents for which the hierarchical field has as value the path A/B/C.

The values should be entered using the correct literal syntax corresponding to the type of the field. For example, for link type fields, you would use:

matchesPath('daisy:10-FOO/daisy:11-FOO')

It is possible to use wildcards (placeholders) in the expression, namely * and **. One stars (*) matches one path part. Two stars (**) matches multiple path parts. Two stars can only be used at the very start or at the very end of the expression (not at both ends at the same time). Some examples to give an idea of what's possible:

$fieldName matchesPath('/A/*')
$fieldName matchesPath('/A/*/*')
$fieldName matchesPath('/A/*/B')
$fieldName matchesPath('/*/*/*')  -> finds all hierarchical paths of length 3
$fieldName matchesPath('/*/*/**') -> finds all hierarchical paths of at least length 3
$fieldName matchesPath('/A/**')   -> finds all paths of any lenght starting on A
                                     thus e.g. A/B, A/B/C or A/B/C/D.
$fieldName matchesPath('**/A')    -> finds all paths ending on A

The argument of matchesPath should be a string, but doesn't have to be a literal. Some examples:

$fieldName matchesPath($anotherField, '/**'))

$fieldName matchesPath(Concat(ContextDoc(link), '/**'))

An example taken from Daisy's own knowledge base:
$fieldName matchesPath(String(ReversePath(
              GetLinkPath('KnowlegeBaseCategoryParent', 'true', ContextDoc(link)))))
4.6.2.8.2 Multi-value hierarchical fields

The matchesPath condition can also be used to search on multi-value hierarchical fields, in which case it will evaluate to true if at least one of the values of the multi-value field matches the path expression.

The special multi-value conditions such as 'has all', 'has some', etc. can also be used. There is no special syntax to specify hierarchical path literals in the query language, but they can be entered by using the Path function. For example:

$fieldName has all ( Path('/A/B/C'), Path('/X/Y/Z') )

The hierarchical paths specified using the Path function do not support wildcards.

4.6.2.8.3 Equals operator

When using the equals operator (=) or other binary operators with hierarchical fields, it will evaluate to true as long as there is one element in the hierarchy path which has the given value. For example, $MyField = 'b' will match a field whose value is "/a/b/c". This is similar to the behaviour of this operator for multivalue fields.

4.6.2.9 Link dereferencing

When an expression returns a link as value (most often this is in the form a link field identifier, e.g. $SomeLinkField), then it is possible to 'walk through' this link to access properties of the linked-to document. This is known as link dereferencing.

The link dereferencing operator is written as "=>". Notations for dereferencing in other languages are sometimes dot (.) or "->", however since dash is a valid character in identifiers in Daisy, and dot is already used to access 'sub-field identifiers' (like #SomePart.mimeType),  these could not be used.

[link expression]=>[identifier]

A practical example:

select name, $SomeLinkField=>name where $SomeLinkField=>name like 'A%' order by $SomeLinkField=>name

As shown in this example, the link dereferencing operator works in the select, where and order by parts of the query.

Link dereferencing can work multiple levels deep, e.g.

$SomeLink=>$SomeOtherLink=>name

If documents are linked together with the same type of field, this could of course be something like:

$SomeLink=>$SomeLink=>$SomeLink=>name

When dereferencing a link in the where-clause of the query, but one does not have access to the dereferenced document, then the evaluation of the where clause will be considered as 'false', e.g. the row will be excluded from the result set, since without access to the document it is not possible to know if it would evaluate to 'true'. Accessing non-accessible values in the select or order-by clauses will return a 'null' value.

4.6.2.10 Other special conditions

4.6.2.10.1 InCollection
InCollection('collectionname' [, collectionname, collectionname])

Searches documents contained in at least one of the specified collections. To search documents that occur in multiple collections (thus in the intersection of those collections), use the function InCollection multiple times with AND in between: InCollection('collection1') and InCollection('collection2'). This also works for OR but in that case it is more efficient to give the collections as arguments to one InCollection call.

Instead of the InCollection condition, you can use the collections identifier in combination with the multi-value field search conditions such as has some, has all or has none for more powerful search possibilities. The InCollection condition predates the existence of multi-value fields, but remains supported.

4.6.2.10.2 LinksTo, LinksFrom, LinksToVariant, LinksFromVariant
LinksTo(documentId, inLastVersion, inLiveVersion [, linktypes])
LinksFrom(documentId, inLastVersion, inLiveVersion [, linktypes])
LinksToVariant(documentId, branch, language, inLastVersion, inLiveVersion [, linktypes])
LinksFromVariant(documentId, branch, language, inLastVersion, inLiveVersion [, linktypes])

Searches documents which link to or from the specified document (or document variant). The other two parameters, inLastVersion and inLiveVersion, are interpreted as booleans: 0 is false, any other (numeric) value is true.

If inLastVersion is true, only documents whose last version link to the specified document are included.

If inLiveVersion is true, only documents whose live version link to the specified document are included.

If both parameters are true or both are false, all documents are returned for which either the last or live version link to the specified document.

The optional parameter linktypes is a string containing a comma or whitespace separated list of the types of links to include, which is one or more of: inline, out_of_line, image, include or other.

4.6.2.10.3 IsLinked, IsNotLinked
IsLinked()
IsNotLinked()

IsLinked() evaluates to true for any document which is linked by other documents, IsNotLinked() evaluates to true for any document that is not linked from any other document (thus not reachable by following links in documents, the navigation tree, or linked by the content of other parts on which link extraction is performed).

4.6.2.10.4 HasPart
HasPart('partTypeName')

Searches documents which have a part of the specified part type. This search is version-dependent.

4.6.2.10.5 HasPartWithMimeType
HasPartWithMimeType('some mimetype')

Searches documents having a part with the given mime type. This search is version-dependent. This uses a 'like' condition, thus the % wildcard can be used in the parameter. For example, to search all images: HasPartWithMimeType('image/%')

4.6.2.10.6 DoesNotHaveVariant
DoestNotHaveVariant(branch, language)

Searches documents that do not have the specified variant. See also the page on variants for more information.

4.6.2.11 Functions

The following functions can be used in value expressions.

4.6.2.11.1 String functions
4.6.2.11.1.1 Concat

Syntax:

Concat(value1, value2, ...) : string

Concatenates multiple strings. If the arguments are not strings, they are converted to a string using the same logic as the String function.

4.6.2.11.1.2 Length

Syntax:

Length(string) : long

Returns the length of its string argument.

4.6.2.11.1.3 Left

Syntax:

Left(string, length) : string

Returns 'length' leftmost characters from the string. If 'length' is larger than the string, the whole string is returned. If length is 0, an empty string is returned.

4.6.2.11.1.4 Right

Syntax:

Right(string, length) : string

Returns 'length' rightmost characters from the string. If 'length' is larger than the string, the whole string is returned. If length is 0, an empty string is returned.

4.6.2.11.1.5 Substring

Syntax:

Substring(string, position, length) : string

Returns a string formed by taking 'length' characters from the string at the specified position. The 'length' argument is optional, if not specified, it will go till the end of the input string. The 'position' argument starts at 1 for the first character.

4.6.2.11.1.6 UpperCase

Syntax:

UpperCase(string) : string
4.6.2.11.1.7 LowerCase

Syntax:

LowerCase(string) : string
4.6.2.11.1.8 String

Syntax:

String(value) : string

Converts its argument to a string.

Some of the behaviours:

4.6.2.11.2 Date and datetime functions
4.6.2.11.2.1 CurrentDate

Syntax:

CurrentDate(spec?) : date

Returns the current date.

The optional spec argument allows to specify an offset to the current date. It is a string with the following syntax:

+/- <num> (days|weeks|months|years)

For example:

CurrentDate('- 7 days')
4.6.2.11.2.2 CurrentDateTime

Syntax:

CurrentDateTime(spec?) : date

Returns the current datetime.

The optional spec argument allows to specify an offset to the current datetime. It is a string with the following syntax:

+/- <num> (seconds|minutes|hours|days|weeks|months|years)

For example:

CurrentDateTime('- 3 hours')
4.6.2.11.2.3 Year, Month, Week, DayOfWeek, DayOfMonth, DayOfYear

These functions all take a date or datetime as argument, and return a long value.

DayOfWeek returns a value in the range 1-7, where 1 is sunday.

For the Week function, the first week of the year is the first week containing a sunday.

4.6.2.11.2.4 RelativeDate, RelativeDateTime

These functions take one string argument consisting of 3 words, each one taken from the following groups:

start    this     week
end      last     month
         next     year

So for example:

RelativeDate('start this month')

returns a date set to the first day of the current month.

4.6.2.11.3 Numeric functions
4.6.2.11.3.1 +, -, * and /

The basic mathematical operations.

4.6.2.11.3.2 Random

Returns a pseudo-random double value greater than or equal to 0 and less than or equal to 1.

4.6.2.11.3.3 Mod

Syntax:

Mod(number1, number2)
4.6.2.11.3.4 Abs, Floor, Ceiling

These functions all take one number as argument.

4.6.2.11.3.5 Round

Syntax:

Round(number, scale)

Rounds the given number to have at most scale digits to the right of the decimal point.

4.6.2.11.4 Special
4.6.2.11.4.1 ContextDoc

Syntax:

ContextDoc(expression [, position])

In some cases a context document is available when performing a query. For example, when a query is embedded inside a document, that document serves as the context document. It is possible to evaluate expressions on this context document by use of this ContextDoc function. The optional position argument allows to climb up in the stack of context documents (which is available in publisher requests).

Examples:

ContextDoc(id) -- the id of the context document
ContextDoc($someField) -- the value of a field of the context document
ContextDoc(Concat(name, ' ', $someField))
4.6.2.11.4.2 UserId

Returns the ID of the current user (= the user executing the query).

UserId()  ->  function takes no arguments
4.6.2.11.4.3 Path

Converts its argument to a hierarchical path literal. This function is useful because there is no special query language syntax for entering hierarchical path literals. The argument should be a slash-separated hierarchical path, e.g.:

Path('/A/B/C')
Path('A/B/C')  -> the initial slash is optional
4.6.2.11.4.4 GetLinkPath

Syntax:

GetLinkPath(linkFieldName, includeCurrent, linkExpr)

Returns a hierarchical path formed by following a chain of documents linked through the specified link field.

The optional boolean argument includeCurrent indicates whether the current document should be part of the hierarchical path.

The optional linkExpr argument can be used to specify the start document, if it is not the current document.

Note that this expression can only be used in the select part of queries, or in the where clause if it is evaluated before performing the search. For example, as argument of matchesPath.

4.6.2.11.4.5 ReversePath

Reverses the order of the elements in a hierarchical path. For example useful in combination with GetLinkPath.

Syntax:

ReversePath(hierarchical-path)

4.6.2.12 Full text queries

4.6.2.12.1 FullText() function

For full text queries, the where part takes a special form. There are two possibilities: either only a full text search is performed, or the fulltext query is further restricted using 'normal' conditions. The two possible forms are:

... where FullText('word')
or
... where FullText('word') AND <other conditions>
for example:
... where FullText('word') AND $myfield = 'abc' AND InCollection('mycollection')

Note that the combining operator between the FullText condition and other conditions is always AND, thus the result of the full text query is further refined. The further conditions can of course be of any complexity, and can thus again contain OR.

The FullText clause needs to be the first after the word "where", it cannot appear at arbitrary positions in the where-clause.

If no order by clause is included when doing a full text query, the results are ordered according to the score assigned by the fulltext search engine.

The parameter of the FullText(...) function is a query which is passed on to the full text engine, in our case Lucene. See here.

The FullText() function can have 3 additional parameters which indicate if the search should be performed on the document name, document content or field content. By default, all three are searched. These parameters should be numeric: 0 indicates false, and any other value true.

For example:

FullText('word', 1, 0, 0)

Searches for 'word', but only in the document name.

Additionally, you can specify a branch and language as parameters to the FullText function, to specify that only documents of that branch/language should be searched. Thus the full syntax of the FullText function is:

FullText(lucene query, searchInName, searchInContent, searchInFields, branch, language)

Specifying the branch and language as part of the FullText function is more more efficient then using:

FullText(lucene query) and branch = 'my_branch' and language = 'my_language'
4.6.2.12.2 FullTextFragment() function

If you wish to have contextualized text fragments of the sought after terms.  This function should be used in the select part of the query.  By default this function will only return the first text fragment found.  The fragments are returned as xml which has the following structure :

<html>
  <body>
    <div class="fulltext-fragment">
      ...  full text fragment ... <span class="fulltext-hit">the term</span> ... more text ...
    </div>
    ...
  </body>
</html>

Usage of  the function when you only wish to receive one fragment (default) :

select FullTextFragment() where FullText('word')

If you wish to have more text fragments you can specify the amount of fragments as a function parameter.

select FullTextFragment(5) where FullText('word')

This function will only return fragments from the content of the document.  This means that context from document name or fields will not appear in the result.

4.6.2.13 The order by part

The order by part is optional.

The order by part contains a comma separated listing of value expressions, each of these optionally followed by ASC or DESC to indicate ascending (the default) or descending order. The expressions listed here have no connection with those in the select-part, i.e. it does not have to be subset of those.

"null" values are put at the end (when using ASC order).

4.6.2.14 The limit part

This can be used to limit the number of results returned from a query. This part is optional.

4.6.2.15 The option part

The option part allows to specify options that influence the execution of the query. The options are defined as:

option_name = 'option_value' (, option_name = 'option_value')*

Supported options:

name

value

default

include_retired

true/false

false

search_last_version

true/false

false

style_hint

(anything)

(empty)

annotate_link_fields

true/false

true

chunk_offset

an integer (start-index is 1)

N/A

chunk_length

an integer

N/A

include_retired is used to indicate that retired documents should be included in the result (by default they are not).

search_last_version is used to indicate that the last version of metadata should be searched and retrieved, instead of the live version. When using this, documents that do not have a live version will also be included in the query result (otherwise they are not included). Full text searches are always performed on the live data, regardless of whether this option is specified.

style_hint is used to supply a hint to the publishing layer for how the result of the query should be styled. The repository server does not do anything more then add the value of this option as an attribute on the generated XML query results (<searchResult styleHint="my hint" ...). It is then up to the publishing layer to pick this up and do something useful with it. For how this is handled in the DaisyWiki, see the page on Query Styling.

annotate_link_fields indicates whether selected fields of type "link" should be annotated with the document name of the document pointed to by the link. If you don't need this, you can disable this to gain some performance.

chunk_offset and chunk_length allow to retrieve a subset of the query results. This is useful for paged display of the query results.

4.6.3 Example queries

4.6.3.1 List of all documents

select id, name where true

4.6.3.2 Search on document name

select id, name where name like 'p%' order by creationTime desc limit 10

4.6.3.3 Show the 10 largest documents

select id, name, totalSizeOfParts where true order by totalSizeOfParts desc limit 10

4.6.3.4 Show documents of which the last version has not yet been published

select id, name, versionState, versionCreationTime
  where versionState = 'draft' option search_last_version = 'true'

4.6.3.5 Overview of all locks

select id, name, lockType, lockOwnerName, lockTimeAcquired, lockDuration
  where lockType is not null

4.6.3.6 All documents having a part containing an image

select id, name where HasPartWithMimeType('image/%')

4.6.3.7 Order documents randomly

select name where true order by Random()

4.6.3.8 Documents ordered by length of their name

select name, Length(name) where true order by Length(name) DESC

4.7 Full Text Indexer

Full text indexing in Daisy happens automatically when document variants are updated, so you do not need to worry about updating the index yourself. Technically, the full text indexer has a durable subscription on the JMS events generated by the repository, and it are these events which trigger the index updating.

4.7.1 Technology

Daisy uses Jakarta Lucene as full-text indexer.

4.7.2 Included content

Only document variants which have a live version are included in the full text index. Thus retired document variants or document variants having only draft versions are not included. It is the content of the live version which is indexed, thus full text search operations always search on the live content.

For each document variant, the included content consists of the document name, the value of string fields, and text extracted from the parts. For the parts, text extraction will be performed on the data if the mime type is one of the following:

Mime type

Comment

text/plain

text/xml

e.g. the "Daisy HTML" parts

application/xhtml+xml

XHTML documents

application/pdf

PDF files

application/vnd.sun.xml.writer

OpenOffice Writer files

application/msword
application/vnd.ms-word

Microsoft Word files

application/mspowerpoint
application/vnd.ms-powerpoint

Microsoft Powerpoint files

application/msexcel
application/vnd.ms-excel

Microsoft Excel files

Support for other formats can be added by implementing a simple interface. Ask on the Daisy Mailing List if you need more information about this.

4.7.3 Index management

4.7.3.1 Optimizing the index

If you have a lot of documents in the repository, you can speed up fulltext searches by optimizing the index.

This is done as follows.

First go to the JMX console as explained in the JMX console documentation.

Follow the link titled: Daisy:name=FullTextIndexer

Look for the operation named optimizeIndex and invoke it (by pressing the Invoke button).

Afterwards, choose "Return to MBean view". In the IndexerStatus field, you will see an indication that the optimizing of the index is in progress. If you have a very small index, the optimizing might go so fast that it is already finished by the time you get back to that page. On larger indexes, the optimize procedure can take quite a bit of time.

4.7.3.2 Rebuilding the fulltext index

Rebuilding the fulltext index can be useful in a variety of situations:

The index can be rebuild for all documents or a selection of the documents.

If you want to completely rebuild the index, you might fist want to delete all the index files, which can be found in:

<daisydata dir>/indexstore

It is harmless to delete these, as the index can be rebuild at any time. Better don't delete them while the repository server is running though.

To trigger the rebuilding, go to the JMX console as explained in the JMX console documentation.

Follow the link titled: Daisy:name=FullTextIndexUpdater

In case you want to rebuild the complete index, invoke the operation named reIndexAllDocuments.

In case you want to rebuild the index only for some documents, you can use the operation reIndexDocuments. As parameter, you need to enter a query to select the documents to reindex. For example, to re-index all documents containing PDFs, you can use:

select id where HasPartWithMimeType('application/pdf')

What you put in the select-clause of the query doesn't matter.

After invoking the reindex operation, choose "Return to MBean view". Look at the attribute ReindexStatus. This will show the progress of the reindexing (refresh the page to see its value being updated). Or more correctly, of scheduling the reindexing. It is important that this ends completely before the repository server is stopped, otherwise the reindexing will not happen completely.

If you have a large repository, the ReindexStatus might show a long time the message "Querying the repository to retrieve the list of documents to re-index". This is because after just starting the repository, the documents still need to be loaded into the cache.

Note that the reindexing here only pushes reindex-jobs to the work queue of the fulltext indexer, the reindexing doesn't happen immediately.

To follow up the status of the actual indexing, go again to the start page of the JMX console, by choosing the "Server view" tab.

Over there, follow this link: org.apache.activemq:BrokerName=DaisyJMS,Type=Queue,Destination=fullTextIndexerJobs

Look for the attribute named QueueSize. This indicates the amount of jobs waiting for the fulltext indexer to process. Each time you refresh this page, you will see this number go lower (or higher of new jobs are being added faster than they are processed).

If you have a large index, it could be beneficial to optimize it after the reindexing finished, as explained above.

4.8 User Management

All operations done on the Daisy Repository Server are done as a certain user acting in a certain role(s). For this purpose, the Repository Server has a user management module to define the users and the roles. The authentication of the users is done by a separate component, allowing to plug in custom authentication techniques.

4.8.1 User Management

Users and roles are uniquely and permanently identified by a numeric ID, but they also have respectively a unique login and unique name.

A user has one or more roles. After logging in, it is both possible to have just one role active and let the user manually switch between his/her roles, or to have all roles of a user active at the same time (which is the behaviour traditionally associated with user groups). If a user has a default role, this role will be active after login. If no default role for the user is specified, all its roles will become active after login, with the exception of the Administrator role (if the user would have this role). This is because the Administrator role allows to do everything, which would then defeat the purpose of having other roles. If the user only has the Administrator role, then obviously that one will become active after login.

Users have a boolean flag called updateable by user: this indicates whether a user can update his/her own record. If true, a user can change its first name, last name, email and password. Role membership can of course not be changed, and neither can the login. It is useful to set this off for "shared users", for example the guest user in the Daisy Wiki application.

The Confirmed and ConfirmKey fields are used to support the well-known email-based verification mechanism in case of self-registration. If the Confirmed flag is false a user will not be able to log in.

4.8.1.1 The Administrator role

The repository server has one predefined role: Administrator (ID: 1). People having the role of Administrator as active role have a whole bunch of special privileges:

4.8.1.2 Predefined users and roles

4.8.1.2.1 $system

$system is a bootstrap user internally needed in the repository. The user $system cannot log in, so its password is irrelevant. This user should not (and cannot) be deleted, nor should it be renamed. Simply don't worry about it.

4.8.1.2.2 internal

The user "internal" is a user created during the initialisation of the Daisy repository. The user is used by various components that run inside the repository server to talk to the repository. By default, we also use this user in the repository client component that runs inside Cocoon, which needs a user to update its caches.

The internal user has (and should have) the Administrator role.

During installation, this user gets assigned a long random generated password (you can see it in the myconfig.xml or cocoon.xconf).

4.8.1.2.3 guest user and guest role (Daisy Wiki)

The Daisy Wiki predefines a user called guest and a role called guest. This user has the password "guest". This is the user that becomes automatically active when surfing to the Daisy Wiki application, without needing to log in. After initialisation of the Daisy Wiki, the ACL is configured to disallow any write operations for users having the guest role.

4.8.1.2.4 registrar (Daisy Wiki)

The registrar user is the user that will:

During installation, this user gets assigned a long random generated password (you can see it in the cocoon.xconf).

4.8.2 Authentication Schemes

Daisy provides its own password authentication, but it is also possible to delegate the authentication to an external system. At the time of this writing, Daisy ships with support for authentication using LDAP and NTLM. It is possible to configure multiple authentication schemes and to have different users authenticated against different authentication schemes.

The authentication schemes are configured in the myconfig.xml file (which is located in <daisy-data-dir>/conf). Just search on "ldap" or "ntlm" and you'll see the appropriate sections. After making changes there, you will need to restart the repository server. To let users use the newly defined authentication scheme(s), you need to edit their settings via the user editor on the administration pages.

Daisy does not do automatic synchronisation of user information (such as updating the e-mail address based on what is stored in LDAP), but it is possible to auto-create users on first log in. This means that when a user logs in for the first time in Daisy, and does not yet exist in Daisy, an authentication scheme is given the possibility to create the user (if it exist in the external system). To enable this feature, search in the myconfig.xml file for "authenticationSchemeForUserCreation".

To debug authentication problems, look at the log files in <daisy-data-dir>/logs/daisy-request-errors-<date>.log. Problems in the configuration of the authentication schemes do not ripple through over the HTTP interface of the repository, thus are not visible in the Daisy Wiki.

4.8.2.1 Implementing new authentication schemes

For a tutorial, see here.

For real samples, simply look at the source code of the NTLM and LDAP schemes. For this, download the Daisy source code, you'll find them in the following directories:

services/ldap-auth
services/ntlm-auth

4.9 Access Control

4.9.1 Introduction

This document explains Daisy's features for access control: the authorisation of document operations such as read and write.

While we usually talk about documents, technically the access control happens on the document variant level: a user is granted or denied access to a certain document variant.

In many systems, access control is configured by having access control lists (ACLs) attached to documents. These ACLs contain access control rules which tell for a certain users or roles (groups) what operations they can or cannot perform.

For Daisy, it was considered to be too laborious to manage ACLs for each individual document. Therefore, there is one global ACL, where you can select sets of documents based on an expression and then define the access control rules that apply to these documents.

4.9.2 Structure of the ACL

The structure of the ACL is illustrated by the diagram below.

In ACL terminology, an object is the protected resource, and a subject is an entity wanting to perform an operation on the object. The objects in our case are documents, selected using an expression. The subjects are users, which can be living organisms, usually humans, or programs acting on behalves of them.

As will become clear when reading about the evaluation of the ACL below, the order of the entries in the ACL is important.

4.9.2.1 Object specification

The expression used to select documents in the object specification uses the same syntax as in the where clause of an expression in the Daisy Query Language. However, the number of identifiers that are available is severely limited. More specifically, you can test on the following things:

Some examples of expressions:

InCollection('mycollection')

documentType = 'Navigation' and InCollection('mycollection')

$myfield = 'x' or $myotherfield = 'y'

For the evaluation of these expression, the data of the fields in the last version is used, not the data from the live version.

4.9.2.2 Access Control Entry

See diagram.

If the subject type is everyone, the subject value should be set to -1.

If you give 'read live' rights to someone, they are able to:

If you give 'read' rights to someone, they have full read access to the document (thus they can view all versions and the list of versions).

If you give 'write' rights to someone, they are able to:

The 'delete' right gives users the possibility to delete documents or document variants.

If you give 'publish' rights to someone, they are able to change the publish/draft state of versions of documents.

4.9.2.3 Staging and Live ACL

In Daisy, there are two ACLs: a staging ACL and a live ACL. Only the staging ACL is directly editable. The only way to update the ACL is to first edit the staging ACL, and then put it live (= copy the staging ACL over the live ACL).

Before putting it live, it is possible to first test the staging ACL: you can give a document id, a role id and a user id and get the result of ACL evaluation in return, including an explanation of which ACL rules made the final decision.

In the Daisy Wiki front end, all these operations are available from the administration console. It is recommended that after editing the ACL, you first test it before putting it live, so that you are sure there are no syntax errors in the document selection expressions.

4.9.3 Evaluation of the ACL: how is determined if someone gets access to a document

The determination of the authorisation of the various operations for a certain document happens as follows:

Further notes:

4.9.4 Other security aspects

This document only discussed authorisation of operations on documents for legitimate users. Other aspects of security include:

4.10 Email Notifier

4.10.1 General

Daisy can send out emails when changes are made to documents. To make use of this the SMTP host must be correctly configured, which is usually done as part of the installation, but can be changed afterwards (see below). In the Daisy Wiki, individual users can subscribe to get notifications by selecting the "User Settings" link, making sure their email address is filled in, and checking the checkbox next to "Receive email notifications of document-related changes.".

Users will only receive events of documents to which they have at least read (not 'read live') access rights. It is possible to receive notifications for individual documents, for all documents belonging to a certain collection, or for all documents. The mails will notify document creation, document updates or version state changes.

While we usually talk about documents, the actual notifications happen on the document variant level.

As you can see on the User Settings page, it is also possible to subscribe to other events: user, schema, collection and ACL related changes. However, for these events proper formatting of the mails is not yet implemented, they simply contain an XML dump of the event.

4.10.2 Configuration

Configuration of the email options happens in the <DAISY_DATA>/conf/myconfig.xml file. There you can configure:

After making any changes to the myconfig.xml file, the repository server needs to be restarted.

4.10.3 Ignoring events from users

Sometimes when doing automated document updates, a lot of change events might be produced, and it can be undesirable to produce email notifications for these events.

For this, the email notifier can be configured to ignore the events caused by certain users. For example, if you have an application which connects as user "john" to the repository server, then it is possible to say that events (document updates etc.) caused by john should not result in email notifications.

The users who's events should be ignored can be configured in the myconfig.xml (use this for permanent settings), but can also be changed at runtime through the JMX console (use this for temporarily disabling notifications for a user).

4.10.4 Implementation notes

The email notifier is an extension component running inside the repository server. It is independent of the Daisy Wiki. The email notifier provides a Java API for managing the subscriptions, as well as additions to the HTTP+XML interface (logical, because that's how the implementation of the Java API talks to the repository).

4.11 Document Task Manager

The purpose of the Document Task Manager (DTM) is to perform a certain task across a set of documents. The DTM is an optional component running inside the Daisy Repository Server. Some of its features are:

Tasks are executed in the background, inside the repository server. Thus the user (a person or another application) starting the task does not have to wait until it is completed, but can do something else and check later if the task ended successfully.

The execution progress of the task is maintained persistently in the database. For each document on which the task needs to be executed, you can consult whether it has been performed successfully, whether it failed (and why), or whether it still has to be executed. Since this information is tracked persistently in the database, it is not lost in case the server would be interrupted.

Tasks can be interrupted. Since the task is performed on one document after another, it is easily possible to interrupt between two documents.

Tasks can be written in Javascript or be composed from built-in actions. Executing custom Javascript-based tasks is only allowed by Administrators, since there is a certain risk associated with it. For example, it is possible to write a task containing an endless loop which would only be interruptible by shutting down the repository server, or a task could call System.exit() to shut down the server.

The execution details of a task, which are stored in the database, are cleaned up automatically after two weeks (by default), and can of course also be deleted manually.

The DTM is accessible via the HTTP API and the Java API.

The Daisy Wiki contains a frontend for starting new tasks and consulting the execution details of existing tasks.

Ideas for the future:

4.12 Publisher

4.12.1 Introduction

The publisher is an extension component running in the content repository server.

Its original goal was to retrieve in one remote call the information you need to display on a page. The result is returned as an XML document.

However, the information that can be requested from the publisher consists of more than just XML dumps of repository entities. The publisher provides all sorts of extra functionality, such as 'prepared documents' for publishing with support for document-dependent content aggregation, or performing diffs between document versions.

The publisher was developed to support the needs of the Daisy Wiki, but is useful for other applications as well. Suggestions for features (or patches) are of course welcomed.

4.12.2 The publisher request format

A publisher request is an XML document with as root element p:publisherRequest, and containing various instructions. This is the full list of available instructions:

Name
p:aclInfo
p:annotatedDocument
p:annotatedVersionList
p:availableVariants
p:choose
p:comments
p:diff
p:document
p:forEach
p:group
p:ids
p:if
p:myComments
p:navigationTree
p:performFacetedQuery
p:performQuery
p:preparedDocuments (& p:prepareDocument)
p:publisherRequest
p:resolveDocumentIds
p:resolveVariables
p:selectionList
p:shallowAnnotatedVersion
p:subscriptionInfo
p:variablesConfig
p:variablesList

4.12.3 Concepts

4.12.3.1 Two kinds of publisher requests

A publisher request takes the form of an XML document, describing the various stuff you want the publisher to return. The publisher request is send to the Publisher component, and the Publisher answers with a big XML response.

Next to the publisher requests that are sent to the Publisher, the Publisher can also execute additional publisher requests as part of the p:preparedDocuments instruction. These additional publisher requests are stored in a directory accessible by the Publisher, usually this is:

<repodata dir>/pubreqs/

The format of these publisher requests is exactly the same.

4.12.3.2 Context document stack

The p:document instruction in a a publisher request pushes a document on the context document stack. A good part of the publisher instructions need a context document based on which they will work.

In expressions or in queries (such as in p:performQuery), the context document stack can be accessed using the ContextDoc(expr[, level]) function. The optional level argument of the ContextDoc function describes how high to go up in the context doc stack. See the query language reference for details.

4.12.3.3 Expressions

The parameters of some publisher instructions, specified in attributes and child-elements, can contain expressions, rather than just a literal value.

To specify an expression, the attribute or element must start with ${ and end on }. For example:

<element attribute=”${some expr}”>${some expr}</element>

Using multiple expressions or having additional content around the expression is not supported.

The expressions are Daisy query-language expressions. The identifiers apply to the current context document. For example the expression ${id} would evaluate to the ID of the current context document.

4.12.4 Testing a publisher request

The Publisher can be easily called using the  HTTP interface. Just create an XML file containing the publisher request, and submit it using a tool like wget, which is available on many Unix systems (there's a Windows version too).

For example, create a file called pubreq.xml containing something like this:

<?xml version="1.0"?>
<p:publisherRequest
  xmlns:p="http://outerx.org/daisy/1.0#publisher"
  locale="en-US">

  <p:document id="1-DSY">
    <p:aclInfo/>
    <p:availableVariants/>
    <p:annotatedDocument/>
    <p:annotatedVersionList/>
  </p:document>

</p:publisherRequest>

The above example assumes a document with id 1-DSY exists. If not, just change the document id.

Now we can execute the publisher request:

wget --post-file=pubreq.xml --http-user=testuser@1
     --http-passwd=testuser http://localhost:9263/publisher/request

See the HTTP API documentation for more examples on using wget.

wget will save the response in a file, typically called request. You can open it in any text or XML editor, but to view it easily readable you can use:

xmllint --format request | less

4.12.5 About the instruction reference

In general, the reference of the publisher instructions only displays the request syntax, and not the format of the responses. Example responses can be easily obtained by executing a publisher request.

4.12.6 p:aclInfo

Returns the result of evaluating the current context document against the ACL.

4.12.6.1 Request

This request requires no attributes, so its syntax is simply:

<p:aclInfo/>

4.12.6.2 Response

The response is a d:aclResultInfo element.

4.12.7 p:annotatedDocument

Returns the XML representation of the current document with some annotations. The annotations include things like the name and label of the document type, display names for users, branch and language name (for all these things, otherwise only the numeric IDs would be present), and annotations to the fields.

4.12.7.1 Request

Syntax:

<p:annotatedDocument [inlineParts="..."] />

In contrast with most other elements, if you request the live version of the document but the document doesn't have a live version, this element will automatically fallback to the last version, so there will always be a d:document element in the response.

The p:annoatedDocument element can have an optional attribute called inlineParts, which can have as value "#all" or "#daisyHtml". In case #daisyHtml is specified, the part content of all Daisy-HTML parts will be inlined (though without any processing applied to them, thus unlike p:preparedDocuments). When #all is specified, the content of all parts whose mime-type starts with "text/" will be inlined. If it is an XML mimetype, the content is inserted as XML. [in the future, this attribute could be expanded so that it takes a comma-separated list of part type names whose content needs to be inlined]

4.12.7.2 Response

A d:document element with annotations.

4.12.8 p:annotatedVersionList

Returns a list of all versions of the document, with some annotations on top of the default XML representation of a version list.

4.12.8.1 Request

Syntax:

<p:annotatedVersionList/>

4.12.8.2 Response

A d:versions element.

4.12.9 p:availableVariants

Returns the available variants for the current context document.

4.12.9.1 Request

Syntax:

<p:availableVariants/>

4.12.9.2 Response

A d:availableVariants element.

4.12.10 p:choose

Allows to execute one among of a number of possible alternatives.

4.12.10.1 Request

Syntax:

<p:choose>
  <p:when test="...">
     [... any publisher instruction ...]
  </p:when>

  [... more p:when's ...]

  <p:otherwise>
    [... any publisher instruction ...]
  </p:otherwise>
</p:choose>

There should be at least one p:when, the p:otherwise is optional.

The test attributes contains an expression as in the where-clause of the query language. Since it always contains an expression, no ${...} should be used.

4.12.10.2 Response

p:choose itself doesn't generate any response, only the response from the executed alternative will be present in the output.

4.12.11 p:comments

Returns the comments for the current context document.

4.12.11.1 Request

Syntax:

<p:comments/>

4.12.11.2 Response

A d:comments element. The newlines in the comments will be replaced with <br/> tags.

4.12.12 p:diff

Returns a diff of the current context document/version with another version of this document or another document.

4.12.12.1 Request

Syntax:

<p:diff contentDiffType="text|html|htmlsource">
  <p:otherDocument id="expr"  branch="expr" language="expr" version="expr"/>
</p:diff>

If no p:otherDocument element is specified, the diff will automatically be taken with the previous version of the document. If there is no such version (because the document has only one version yet), there will be no diff response.

If a p:otherDocument element is supplied, any combination of attributes is allowed, all attributes are optional.

The contentDiffType attribute is optional, text is the default. Specify 'html' for a visual HTML compare, 'htmlsource' does in an inline HTML source diff (rather than a line-based diff). This attribute only has effect on Daisy-HTML parts.

4.12.12.2 Response

A diff-report element, except in the case no output is generated because the from or to version is not available.

4.12.13 p:document

A p:document request is push a document on the context document stack, and thus to change the currently active context document. The context document is the document on which the document related requests apply.

4.12.13.1 Request

Any publisher request element can be nested within p:document.

The p:document request can work in three ways

4.12.13.1.1 (1) Specify the document explicitly

Syntax:

<p:document id="..." branch="..." language="..." version="...">
  [ ... child instructions ... ]
</p:document>

Using the attributes id, branch (optional), language (optional) and version (optional) the new context document is specified.

The branch and language can be specified either by name or ID. If not specified, they default to the main branch and default language.

The version can be specified as "live" (the default), "last" or an explicit version number.

4.12.13.1.2 (2) Specify a field attribute

Syntax:

<p:document field="..." hierarchyElement="...">
  [ ... child instructions ... ]
</p:document>

When the p:document is used in a location where there is already a context document (e.g. from a parent p:document), it is possible to use an attribute called field. The value of this attribute should be the name of a link-type field. This p:document request will then change the context document to the document specified in that link-type field in the current context document. If the current context document does not have such a field, the p:document request will be silently skipped. If the link-type field is a multivalue field, the p:document request will be executed once for each value of the multivalue field. This will then lead to multiple sibling p:document elements in the publisher response.

Exactly the same can also be achieved through the p:forEach request. In fact, the internal implementation uses p:forEach so this is just an alternative (older) syntax for the same thing.

See also p:forEach for an explication of the hierarchyElement attribute.

4.12.13.1.3 (3) Implicit

Sometimes the p:document instruction is used as the child of other instructions such as p:forEach or p:navigationTree. In that case the context document is determined by the parent instruction, and should hence not be specified.

4.12.13.2 Response

The p:document request will output a p:document element in the publisherResponse. This element will have attributes describing the exact document variant and version that the context document was changed to.

4.12.14 p:forEach

Executes publisher instructions for each document in a list of documents. The list of documents on which to operate can either result from a query or an expression.

4.12.14.1 Request

4.12.14.1.1 Query

Syntax:

<p:forEach useLastVersion="true|false">
  <p:query>select ... where ... order by ...</p:query>
  <p:document>
    [ ... child instructions ... ]
  </p:document>
</p:forEach>

If the useLastVersion attribute is false or not specified, the live version of each document will be used, otherwise the last version.

If there is a context document available then the ContextDoc function can be used in the query or expression.

4.12.14.1.2 Expression

Syntax:

<p:forEach useLastVersion="true|false">
  <p:expression [precompile="true|false"] [hierarchyElement="all|an integer"]>...</p:expression>
  <p:document>
    [ ... child instructions ... ]
  </p:document>
</p:forEach>

The expression is an expression using the Daisy query language syntax, and should return link values.

The optional precompile attribute indicates whether the expression should be compiled just once or recompiled upon each execution. Usually one should leave this to its default true value.

When the value returned by the expression is a hierarchy path (or a multivalue of hierarchy paths), then p:forEach will by default run over all the values in the hierarchy path. The optional hierarchyElement attribute can be used to select just one element from the hierarchy path. This attribute accepts integer values and 'all'. If you do not specify the hierarchy element attribute then 'all' will be used by default.

An illustration. Consider the following hierarchy: /domain/kingdom/subkingdom/branch/infrakingdom. Using 2 as hierarchyElement will get you documents at the level of kingdom. Using -2 will get you documents at the level of branch. As you can see the hierarchyElement specification is 1-based. Choosing level 12 will get you nothing since the hierarchy does not have 12 levels.

4.12.14.1.2.1 Examples

If the current context document has a link field (single or multi-value) called MyField, then you could run over its linked documents as follows:

<p:forEach useLastVersion="true|false">
  <p:expression>$MyField</p:expression>
  <p:document>
    <p:annotatedDocument/>
  </p:document>
</p:forEach>

If you have a number of documents which are linked in a chain using a certain field (e.g. Category documents with a link field pointing to their parent category), then the GetLinkPath function is a useful tool:

<p:expression>GetLinkPath('CategoryParent')</p:expression>

4.12.14.2 Response

Zero or more p:document elements.

4.12.15 p:group

The p:group element acts as a container for other instructions. It allows to distinguish between e.g. different queries or navigation tree results if you would have more than one of them.

4.12.15.1 Request

Syntax:

<p:group id="expr" catchErrors="true|false">

 [... child instructions ...]

</p:group>

The id attribute is required.

The catchErrors attribute is optional. When this attribute has the value true, any errors that occur during the processing of the children of the p:group element will be caught. The result will then be a p:group element with an attribute error="true" and as child the stacktrace of the error. When catchErrors is true, the result of the execution of the children of the p:group element will need to be buffered temporarily, so only use this when you really need it.

4.12.15.2 Response

A p:group element with id attribute, and contained in it the output of the child instructions.

4.12.16 p:ids

Returns the list of all values of the id attributes occurring in the Daisy-HTML parts of the current context document. This can be useful in editors to show the user a list of possible fragment identifier values.

4.12.16.1 Request

Syntax:

<p:ids/>

4.12.16.2 Response

<p:ids>
  [ zero or more child p:id elements ]
  <p:id>....</p:id>
</p:ids>

4.12.17 p:if

Allows to execute a part of the publisher request only if a certain test is satisfied.

4.12.17.1 Request

Syntax:

<p:if test="...">
  [... child instructions ...]
</p:if>

The test attribute specifies a conditional expression (an expression evaluating to true or false) in the same format as used in the Daisy query language.

For example:

<p:if test="$MyField > 20">
  ...
</p:if>

4.12.17.2 Response

If the test evaluated to true, the output of the child instructions, otherwise nothing.

4.12.18 p:myComments

Returns a list of all private comments of the user.

4.12.18.1 Request

Syntax:

<p:myComments/>

4.12.18.2 Response

Same as for p:comments.

4.12.19 p:navigationTree

Request a navigation tree from the Navigation Manager.

4.12.19.1 Request

The full form of this request is:

<p:navigationTree>
  <p:navigationDocument id="expr" branch="expr" language="expr"/>
  <p:activeDocument id="expr" branch="expr" language="expr"/>
  <p:activePath>expr</p:activePath>
  <p:contextualized>true|false</p:contextualized>
  <p:depth>...</p:depth>
  <p:versionMode>live|last</p:versionMode>
  <p:document>...</p:document>
</p:navigationTree>

The elements p:activeDocument, p:activePath, p:versionMode, p:depth and p:document elements are optional.

To make the active document the current context document, one can use expressions like this:

<p:navigationTree>
  <p:navigationDocument id="338-DSY"/>
  <p:activeDocument id="${id}" branch="${branch}" language="${language}"/>
</p:navigationTree>
4.12.19.1.1 Attaching additional information to the document nodes

If a p:document instruction is present, then the p:document will be executed for each document node in the navigation tree, and the response is inserted as first child of each element. This provides the ability to annotate the document nodes in the navigation tree with additional information of the document, so that you can e.g. know the document type of the document. One could even aggregate the full content of the documents, if desired.

An example:

<p:navigationTree>
  <p:navigationDocument id="338-DSY"/>
  <p:document>
    <p:annotatedDocument/>
  </p:document>
</p:navigationTree>

4.12.19.2 Response

The response of this instruction is a n:navigationTree element, with n denoting the navigation namespace.

4.12.20 p:performFacetedQuery

Returns the result of executing a query.

4.12.20.1 Request

Syntax:

  <p:performFacetedQuery>
    <p:options>
      <p:additionalSelects>
        <p:expression>name</p:expression>
        <p:expression>summary</p:expression>
      </p:additionalSelects>
      <p:defaultConditions>documentType='SimpleDocument'</p:defaultConditions>
      <p:defaultSortOrder>name asc</p:defaultSortOrder>
      <p:queryOptions>
        <p:queryOption name="include_retired" value="true"/>
      </p:queryOptions>
    </p:options>
    <p:facets>
      <p:facet expression="variantLastModifierLogin" maxValues="10" sortOnValue="true" sortAscending="false" type="default">
        <p:properties>
          <p:property name="" value=""/>
        </p:properties>
      </p:facet>
    </p:facets>
  </p:performFacetedQuery>

This syntax is similar to the one for the faceted browser definition.

4.12.20.1.1 Options

All options are, as the name says, optional. The options element is not optional.

4.12.20.1.2 Facets

Each facet element defines a facet.
Attributes :

4.12.20.2 Response

The result of a query is a d:facetedQueryResult element.

4.12.21 p:performQuery

Returns the result of executing a query.

4.12.21.1 Request

Syntax:

<p:performQuery>
  <p:query>select ... where ... order by ...</p:query>

  [ optional elements: ]
  <p:extraConditions>...</p:extraConditions>
  <p:document>...</p:document>

</p:performQuery>

If there is a context document available (i.e. if this p:performQuery is used inside a p:document) then the ContextDoc function can be used in the query.

The optional p:extraConditions element specifies additional conditions that will be AND-ed with those in the where clause of the query. This feature is not often needed. It can be useful when you let the user enter a free query but want to enforce some condition, e.g. limit the documents to a certain collection.

The optional p:document element can contain publisher instructions that will be performed for each row in the query result set, their result of them will be inserted as child elements of the row elements. On each result set row, the document corresponding to the row is pushed as context document. This functionality is similar to what can be done with p:forEach, but has the advantage that information about the query such as chunk offset and size stays available.

4.12.21.2 Response

The result of a query is a d:searchResult element.

4.12.22 p:preparedDocuments (& p:prepareDocument)

p:preparedDocuments is the most powerful of all publisher requests. It returns the content of the document prepared for publishing. The preparation consists of all sorts of things such as:

A very important point of p:preparedDocuments is that is able to use secondary publisher requests for the requested document and each included document. The publisher requests to use are determined based on a mapping file and allow to aggregate additional information withoccu the document based on e.g. its document type.

4.12.22.1 Request

Syntax:

<p:preparedDocuments publisherRequestSet="..."
                     displayContext="free string"
                     applyDocumentTypeStyling="true|false">
  <p:navigationDocument id="..." branch="..." language="..."/>
</p:preparedDocuments>

The applyDocumentTypeStyling and displayContext attributes are not used by the publisher, but are simply replicated in the result.

In the Daisy Wiki, their purpose are:

The publisherRequestSet attribute: see below.

The p:navigationDocument element is optional. If supplied, it enables to annotate "daisy:" links with the path where they occur in the navigation tree.

4.12.22.1.1 How it works

p:preparedDocuments looks up a new publisher request to be performed on the context document. The publisher request to be used can be determined dynamically (described further on), but by default it is this one:

<p:publisherRequest xmlns:p="http://outerx.org/daisy/1.0#publisher">
  <p:prepareDocument/>
  <p:aclInfo/>
  <p:subscriptionInfo/>
</p:publisherRequest>

This publisher request should (usually) contain a <p:prepareDocument/> element. The p:prepareDocument will be replaced by the Daisy document as XML (d:document), in which the HTML content is inlined and processed (i.e. the things mentioned in the enumeration above). If the content contains an include of another document, then for this included document the publisher will again determine a publisher request to be performed upon it, and execute it. The same happens for each include (recursively). The results of all these publisher requests are inserted in the publisher response in a structure like this:

<p:preparedDocuments applyDocumentTypeSpecificStyling="true|false">

  <p:preparedDocument id="1">
    <p:publisherResponse>
      <d:document ...
    </p:publisherResponse>
  </p:preparedDocument>

  <p:preparedDocument id="2">
    <p:publisherResponse>
      <d:document ...
    </p:publisherResponse>
  </p:preparedDocument>

</p:preparedDocuments>

The publisher response of the context document will always end up in the p:preparedDocument element with attribute id="1".  If the document includes no other documents, this will be the only p:preparedDocument. Otherwise, for each included document (directly or indirectly), an additional p:preparedDocument element will be present.

So the included documents are not returned in a nested structure, but as a flat list. This allows to perform custom styling on each separate document before nesting them.

On the actual location of an include, a p:daisyPreparedInclude element is inserted, with an id attribute referencing the related p:preparedDocument element.

The content of a p:preparedDocument element is thus a single p:publisherResponse element, which in turns contains a single d:document element (as result of the p:prepareDocument in the publisher request). This d:document element follows the standard form of XML as is otherwise retrieved via the HTTP interface or by using the getXml() method on a document object, but with lots of additions such as inlined content for Daisy-HTML parts and non-string attribute values formatted according to the specified locale.

If you requested the live version of the document, but the document does not have a live version, there will simply be no p:preparedDocuments element in the response.

4.12.22.1.2 Determination of the publisher request to be performed

If instead of the default publisher request mentioned above, you want to execute some custom publisher request (which can be used to retrieve information related to the document being published), then this is possible by defining a publisher request set.

In the data directory of the repository server, you will find a subdirectory called 'pubreqs'. In this directory, each subdirectory specifies a publisher request set. Each such subdirectory should contain a file called mapping.xml and one or more other files containing publisher requests.

The mapping.xml file looks like this:

<m:publisherMapping xmlns:m="http://outerx.org/daisy/1.0#publishermapping">
  <m:when test="documentType = 'mydoctype'" use="myrequest.xml"/>
  <m:when test="true" use="default.xml"/>
</m:publisherMapping>

The publisher will run over each of the when rules, and if the expression in the test attribute matches, it will use the publisher request specified in the use attribute. The expressions are the same as used in the query language, and thus also the same as used in the ACL definition.

To make use of such a specific set of publisher requests, you use the publisherRequestSet attribute on the p:preparedDocuments element. The value of this attribute should correspond to the name of subdirectory of the pubreqs directory.

In the Daisy Wiki, the publisher request set to be used can be specified in the siteconf.xml

4.12.22.1.3 p:prepareDocument

The p:prepareDocument can have an optional attribute called inlineParts. This attribute specifies a comma-separated list of part type names (or IDs) for which the content should be inlined. By default this only happens for parts for which the Daisy-HTML flag is set to true.

The inlining will only happen if the actual part has a mime-type that starts with "text/" or if the mime-type is recognized as an XML mime-type. Recognized XML mime-types are currently text/xml, application/xml or any mime type ending with "+xml".

If it is an XML mime-type, then the content will be parsed and inserted as XML. Otherwise, it will be inserted as character data (assuming UTF-8 encoding of the part text data). If the inlining actually happened, an attribute inlined="true" is added to the d:part element in question.

4.12.23 p:publisherRequest

p:publisherRequest is the root element of a publisher request document.

A basic, empty publisher request is structured as follows:

<p:publisherRequest
    xmlns:p="http://outerx.org/daisy/1.0#publisher"
    locale="en-US"
    versionMode="live"
    exceptions="throw">

 [... various publisher requests ...]

</p:publisherRequest>

The reponse of a publisher request is structured as follows:

<p:publisherResponse
    xmlns:p="http://outerx.org/daisy/1.0#publisher">

 [... responses to the various requests ...]

</p:publisherResponse>

About the attributes on the p:publisherRequest element:

The locale attribute is optional, by default the en-US locale will be used. If the publisher request is executed as part of another publisher request (see p:preparedDocuments), the locale will default to the locale of the 'parent' publisher request.

The versionMode attribute is optional, valid values are live (the default) or last. This attribute indicates which version of a document should be used by default if no explicit version is indicated. If its value is 'last', it will also cause the option 'search_last_version' to be set for various queries (those embedded in documents when requesting a preparedDocument, those executed by p:performQuery or p:forEach). It will also influence the version mode of the navigation tree when using p:navigationTree.

The exceptions attribute is also optional, throw is its default value. Basically this means that if an exception occurs during the processing of the publisher request, it will be thrown. It is also possible to specify inline, in which case the error description will be embedded in the p:publisherResponse element, but no exception will be thrown.

The styleHint attribute (optional, not shown above) will simply be replicated on the p:publisherResponse element. The publisher itself does not interpret the value of this attribute, it can be used by the caller of the publisher to influence the styling process. In the case of the Daisy Wiki, the styleHint attribute can contain the name of an alternative document-styling XSL to use (instead of the default doctype-name.xsl).

4.12.24 p:resolveDocumentIds

This element allows to retrieve the names of a set of documents of which you have only the ID. The advantage compared to using simply the repository API is that this only requires one remote call for as many documents as you need (assuming you are using the remote API, otherwise it does not make a difference).

4.12.24.1 Request

Syntax:

<p:resolveDocumentIds branch="..." language="...">
  <p:document id="..." branch="..." language="..." version="..."/>
</p:resolveDocumentIds>

The branch and language attributes on the p:resolveDocumentIds element specify the default branch and language to use if it is not specified on the individual documents. These attributes are optional, the main branch and default language is used as default when these attributes are not specified.

The branch, language and version attributes on the p:document element are optional. By default the live version is used, or the last version if the document does not have a live version.

4.12.24.2 Response

The result has the following format:

<p:resolvedDocumentIds>
  <p:document id="..." branch="..." language="..." version="..." name="..."/>
</p:resolvedDocumentIds>

The id, branch, language and version attributes are simply copied from the requesting document element. The name attribute is added. The p:document elements in the result corresponds to those in the request at the same position.

If there is some error (such as the document does not exist, or the specified branch or language does not exist), an error message is put in the name attribute.

4.12.25 p:resolveVariables

Resolves variables in the specified text strings.

4.12.25.1 Request

<p:resolveVariables>
  <p:text>...</p:text>
  ... more p:text elements ...
</p:resolveVariables>

Variables should be embedded in the text using ${varname} syntax ($$ is used to escape $).

4.12.25.2 Response

<p:resolvedVariables>
  <p:text>...</p:text>
  ... more p:text elements ...
</p:resolvedVariables>

Each p:text element in the response corresponds to the p:text element in the request at the same position.

4.12.26 p:selectionList

This instruction allows to retrieve the selection list of a field type.

4.12.26.1 Request

Syntax:

<p:selectionList fieldType="..." branch="expr" language="expr"/>

The fieldType attribute can contain either the name or ID of the field type.

The branch and language attributes are optional, if not present they default to those of the context document, if any, and otherwise to the main branch and default language. The branch and language only make a difference when the selection list implementation needs them, e.g. for query selection lists with "filter variants automatically" behaviour.

4.12.26.2 Response

If the field type has no selection list, the output will contain nothing, otherwise a d:expSelectionList element will be present. The 'exp' prefix stands for 'expanded', this in contrast to the defintion of the selection list (which can e.g. be a query).

4.12.27 p:shallowAnnotatedVersion

Returns the shallow version XML, this the version XML without field and part information in it.

4.12.27.1 Request

Syntax:

<p:shallowAnnotatedVersion/>

4.12.27.2 Response.

A d:version element.

If you requested the live version of the document, but the document does not have a live version, there will simply be no d:version element in the response.

4.12.28 p:subscriptionInfo

Returns whether the user is subscribed for email notifications on the current context document.

4.12.28.1 Request

Syntax:

<p:subscriptionInfo/>

4.12.28.2 Response

The response is the same element with the actual subscription status added:

<p:subscriptionInfo subscribed="true|false"/>

4.12.29 p:variablesConfig

This is not a publisher instruction, but rather configuration information for the variable resolution.

With "variables" we mean the variables that can be embedded in Daisy-HTML parts and document names. See [todo] for more information on this topic.

The p:variablesConfig element can only occur as first child of p:publisherRequest.

4.12.29.1 Syntax

The syntax is as follows:

<p:variablesConfig>

  <p:variableSources>
    <p:variableDocument id="..." branch="..." language="..."/>
    [... more p:variableDocument elements ...]
  </p:variableSources>

  <p:variablesInAttributes [allAttributes="true|false"] >
     <p:element name="img" attributes="daisy-caption,alt"/>
     [... more p:element elements ...]
  </p:variablesInAttributes>

</p:variablesConfig>

All elements are optional.

The p:variableDocument elements point to documents containing the variable-to-value mappings. These are documents containing a part VariablesData. The XML format that should be in there is documented in the variables documents, it is not repeated here to avoid duplication.

The p:variablesInAttributes element configures whether variables should be resolved in attributes. If this element is not present, no variable resolving in attributes will happen. For the case where speed is a concern, you can configure which attributes on which elements should be considered.

4.12.29.2 Effect of p:variablesConfig

When a p:variablesConfig is specified, variable resolution will be enabled for a number of cases:

4.12.30 p:variablesList

Returns a list of all defined variables, according to the active p:variableConfig of the current publisher request. This is mostly useful to let editors pick variables from the list of available variables.

4.12.30.1 Request

This request requires no attributes, so its syntax is simply:

<p:variablesList/>

4.12.30.2 Response

The response looks like this:

<p:variablesList>
  <p:variable name="...">...the value...</p:variable>
  ... more p:variable elements ...
</p:variablesList>

If there are no variables, an empty p:variablesList element will be present in the response.

4.13 Backup locking

The practical side of making backups is explained in the section Making backups. Here we only describe the backup-lock mechanism, which is of use if you want to write your own backup tool.

The repository server uses multiple storages: a relational database, the blobstore (a filesystem directory), and the full text indexes (also stored on the filesystem). Next to that, the JMS system also has its own database. Daisy does not employ some fancy distributed transaction manager with associated log, therefore it has its own simple mechanism to allow to take a consistent backup of these different stores. Daisy allows to take a "backup lock" on the repository server. This will:

A backup lock is requested via the JMX interface. It needs a "timeout" parameter that specifies how long to wait for any running operations to end.

While the backup lock is active, all read operations will continue to work, and update operations that only involve the repository's relational database will continue to work. Operations that require an update to the blobstore (such as saving a document in most cases) will give an exception.

4.14 Image thumbnails and metadata extraction

The repository server contains an (optional) component that can perform image thumbnailing, extraction of metadata (width, height and, for JPEG, arbitrary Exif fields), and automatic rotation of JPEG images as indicated in the Exif data. This component is registered as a "pre-save-hook", this is a component which gets called before a document is saved, and which can modify the content of the to-be-saved document.

For further reference, we will call this component the image hook.

The image hook can be configured to react on multiple document types, and for each document type it is possible to specify what needs to be done:

By default, the image hook is configured to handle the Image document type as defined by the Daisy Wiki.

The configuration of the image hook can be adjusted via the myconfig.xml configuration file of the repository server.

The image hook will only perform its work if the part containing the original image data changed, or if any of the to-be extracted information is missing in the target document. Therefore, if for some reason you want to trigger the image hook to redo its work, you need to make sure one of these conditions is true.

4.15 Programming interfaces

The native API of the Daisy repository server is its Java interface. To allow other processes (on the same or another computer) to talk to the repository server, a HTTP+XML based interface is available. Lastly, the Java API of the Daisy repository server is also implemented in a "remote" variant, whereby it transparently uses the HTTP+XML interface to talk to the repository server.

Since a variety of scripting languages can be run on top of the Java VM, it is possible to use the Java API from such scripting languages, which is convenient for smaller jobs.

4.15.1 Java API

4.15.1.1 Introduction

Daisy is written in Java and thus its native interface is a Java API. This Java API is packaged separately, and consists of two jars:

daisy-repository-api-<version>.jar
daisy-repository-xmlschema-bindings-<version>.jar

The second jar, the xmlschema-bindings, are Java classes generated from XML Schemas, and form a part of the API. To write client code that talks to Daisy, at compile you need only the above two jars in the classpath (at runtime, you need a concrete implementation, see further on).

There are two implementations of this API available:

This is illustrated in the diagram below.

To be workable, the remote implementation caches certain information: the repository schema (document, field and part types), the collections, and the users (needed to be able to quickly map user IDs to user names). To be aware of changes done by other clients, the remote implementation can listen to the JMS events broadcasted by the server to update these caches. This is optional, for example a short-running client application that performs a specific task probably doesn't care much about this, especially since the cached information is not the kind of information that changes frequently. Even when JMS-based cache invalidation is disabled, the caches of a certain remote implementation instance are of course kept up-to-date for changes done through that specific instance.

Examples of applications making use of the remote API implementation are the Daisy Wiki, and the installation utilities daisy-wiki-init and daisy-wiki-add-site. Especially the source of those last two can serve as useful but simple examples of how to write client applications. The shell scripts to launch them show the required classpath libraries.

4.15.1.2 Reference documentation

See the javadoc documentation.

4.15.1.3 Quick introduction to the Java API

The Daisy Java API is quite high-level, and thus easy-to-use. The start point to do any work is the RepositoryManager interface, which has the following method:

Repository getRepository(Credentials credentials) throws RepositoryException;

The Credentials parameter is simply an object containing the user name and password. By calling the getRepository method, you get an instance of Repository, through which you can access all the repository functionality. The obtained Repository instance is specific for the user specified when calling getRepository. The Repository object does not need to be released after use. It is a quite lightweight object, mainly containing the authentication information.

Let's have a look at some of the methods of the Repository interface.

Document createDocument(String name, String documentTypeName);

Creates a new document with the given name, and using the named document type. The document is not immediately created in the repository, to do this you need to call the save() mehod on the Document. But first you need to set all required fields and parts, otherwise the save will fail (it is possible to circumvent this, see the full javadocs).

Document getDocument(long documentId, boolean updateable) throws RepositoryException;

Retrieves an existing document, specified by its ID. If the flag 'updateable' is false, the repository will return a read-only Document object, which allows it to return a shared cached copy. In the remote implementation, this doesn't matter since it doesn't perform any caching, but in the local implementation this can make a very huge difference.

RepositorySchema getRepositorySchema();

Returns an instance of RepositorySchema, through which you can inspect and modify the repository schema (these are the document, part and field types).

AccessManager getAccessManager();

Returns an instance of AccessManager, through which you can inspect and modify the ACL, and get the ACL evaluation result for a certain document-user-role combination.

QueryManager getQueryManager();

Returns an instance of QueryManager, through which you can perform queries on the repository using the Daisy Query Language.

CollectionManager getCollectionManager();

Returns an instance of CollectionManager, through which you can create, modify and delete document collections.

UserManager getUserManager();

Returns an instance of UserManager, through which you can create, modify and delete users.

The above was just to give a broad idea of the functionality available through the API. For more details, consult the complete JavaDoc of the API.

4.15.1.4 Writing a Java client application

Let's now look at a practical example.

Here's a list of jars you need in the CLASSPATH to use the remote repository API implementation.

This list is bound to change in different Daisy releases. We advise you to copy the settings of the CLASSPATH defined in the daisy-js (daisy-js.bat on Windows) script. (possibly removing the 'rhino' and 'daisy-javascript' jars)
The list below was last updated for Daisy 1.5.1:

DAISY_HOME/lib/daisy/jars/daisy-repository-api-1.5.jar
DAISY_HOME/lib/daisy/jars/daisy-repository-client-impl-1.5.jar
DAISY_HOME/lib/daisy/jars/daisy-repository-spi-1.5.jar
DAISY_HOME/lib/daisy/jars/daisy-util-1.5.jar
DAISY_HOME/lib/avalon-framework/jars/avalon-framework-api-4.1.5.jar
DAISY_HOME/lib/daisy/jars/daisy-repository-common-impl-1.5.jar
DAISY_HOME/lib/commons-httpclient/jars/commons-httpclient-2.0.2.jar
DAISY_HOME/lib/xmlbeans/jars/xbean-2.1.0.jar
DAISY_HOME/lib/xmlbeans/jars/xmlpublic-2.1.0.jar
DAISY_HOME/lib/stax/jars/stax-api-1.0.jar
DAISY_HOME/lib/daisy/jars/daisy-repository-xmlschema-bindings-1.5.jar
DAISY_HOME/lib/concurrent/jars/concurrent-1.3.2.jar
DAISY_HOME/lib/commons-logging/jars/commons-logging-1.0.4.jar
DAISY_HOME/lib/commons-collections/jars/commons-collections-3.1.jar
DAISY_HOME/lib/daisy/jars/daisy-jmsclient-api-1.5.jar
DAISY_HOME/lib/geronimo-spec/jars/geronimo-spec-jms-1.1-rc4.jar

(below only if you need the htmlcleaner)

DAISY_HOME/lib/daisy/jars/daisy-htmlcleaner-1.5.jar
DAISY_HOME/lib/nekohtml/jars/nekohtml-0.9.5.jar
DAISY_HOME/lib/nekodtd/jars/nekodtd-0.1.11.jar
DAISY_HOME/lib/xerces/jars/xercesImpl-2.6.2.jar
DAISY_HOME/lib/xerces/jars/xmlParserAPIs-2.2.1.jar

So depending on your own habits, you could set up a project in your IDE with these jars in the classpath, or make an Ant project, or whatever.

Below a simple and harmless example is shown: performing a query on the repository.

package mypackage;

import org.outerj.daisy.repository.RepositoryManager;
import org.outerj.daisy.repository.Credentials;
import org.outerj.daisy.repository.Repository;
import org.outerj.daisy.repository.query.QueryManager;
import org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager;
import org.outerx.daisy.x10.SearchResultDocument;

import java.util.Locale;

public class Search {
    public static void main(String[] args) throws Exception {
        RepositoryManager repositoryManager = new RemoteRepositoryManager(
            "http://localhost:9263", new Credentials("guest", "guest"));
        Repository repository =
            repositoryManager.getRepository(new Credentials("testuser", "testuser"));
        QueryManager queryManager = repository.getQueryManager();

        SearchResultDocument searchresults =
            queryManager.performQuery("select id, name where true", Locale.getDefault());
        SearchResultDocument.SearchResult.Rows.Row[] rows =
            searchresults.getSearchResult().getRows().getRowArray();

        for (int i = 0; i < rows.length; i++) {
            String id = rows[i].getValueArray(0);
            String name = rows[i].getValueArray(1);
            System.out.println(id + " : " + name);
        }

        System.out.println("Total number: " + rows.length);

    }
}

The credentials supplied in the constructor of the RemoteRepositoryManager specify a user to be used for filling the caches in the repository client. This can be any user, the user doesn't need any special access privileges.

4.15.1.5 Java client application with Cache Invalidation

Needs updating for switch to ActiveMQ (java sample code also need updating due to new required JMS client ID)

For long-running client applications you may want to have the caches of the client invalidated when changes happen by other users. For a code sample of how to create a JMS client and pass it on to the RemoteRepositoryManager, see JMS Cache Invalidation Sample.

For this example to run, you'll need the JMS client implementation jars in the CLASSPATH, in addition to the earlier listed jars:

DAISY_HOME/lib/daisy/jars/daisy-jmsclient-impl-1.3.jar
DAISY_HOME/lib/exolabcore/jars/exolabcore-0.3.7.jar
DAISY_HOME/lib/openjms/jars/openjms-client-0.7.6.jar

4.15.1.6 More

It might be interesting to also have a look at the notes on scripting using Javascript, since there essentially the same API is used from a different language.

4.15.2 Scripting the repository using Javascript

4.15.2.1 Introduction

Rhino, a Java-based Javascript implementation, makes it easy to use the Java API of the repository server to automate all kinds of operations. In other words: easy scripting of the repository server. It brings all the benefits of Daisy's high-level repository API without requiring Java knowledge or the setup of a development environment.

4.15.2.2 How does it work?

  1. Write a Javascript, save it in a ".js" file.
  2. Open a command prompt or shell, set the DAISY_HOME environment variable to point to your Daisy installation
  3. Go to the directory <DAISY_HOME>/bin
  4. Execute "daisy-js <name-of-scriptfile>"

4.15.2.3 Connecting to the repository server from Javascript

The basic code you need to connect to the repository server from Javascript is the following:

importPackage(Packages.org.outerj.daisy.repository);
importClass(Packages.org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager);

var repositoryManager = new RemoteRepositoryManager("http://localhost:9263",
                                             new Credentials("guest", "guest"));
var repository = repositoryManager.getRepository(new Credentials("testuser", "testuser"));

Some explanation:

The importPackage and importClass statements are used to make the Daisy Java API available in the Javascript environment.

Then a RepositoryManager is constructed, this is an object from which Repository objects can be retrieved. A Repository object represents a connection to the Daisy Repository Server for a certain user. Typically, you only construct one RepositoryManager, and then retrieve different Repository objects from it if you want to perform actions under different users.

The first argument of the RepositoryManager constructor is the address of the HTTP interface of the repository server (9263 is the default port). The second argument is a username and password for a user that is used inside the implementation to fill up caches. This can be any user, here we've re-used the guest user of the Wiki. This user does not need to have any access permissions on documents. (Inside the implementation, some often needed info like the repository schema and the collections is cached)

Then from the RepositoryManager a Repository for a specific user is retrieved.

4.15.2.4 Repository Java API documentation

Reference documentation of the Daisy API is available online and included in the binary distribution in the apidocs directory (open the file index.html in a web browser). See also Java API.

4.15.2.5 Examples

4.15.2.5.1 Creating a document (uploading an image)

This example uploads an image called "myimage.gif" from the current directory into the repository.

importPackage(Packages.org.outerj.daisy.repository);
importClass(Packages.org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager);

var repositoryManager = new RemoteRepositoryManager("http://localhost:9263",
                                                    new Credentials("guest", "guest"));
var repository = repositoryManager.getRepository(new Credentials("testuser", "testuser"));

var document = repository.createDocument("My test image", "Image");
var imageFile = new java.io.File("myimage.gif");
document.setPart("ImageData", "image/gif", new FilePartDataSource(imageFile));
document.save();

print("Document created, ID = " + document.getId());

See the API documentation for the purpose of the arguments of the methods. For example, the text "Image" supplied as the second argument of the createDocument method is the name of the document type to use for the document. Likewise, the first argument of setPart, "ImageData", is the name of the part.

It would be an interesting exercise to extend this example to upload a whole directory of images :-)

4.15.2.5.2 Performing a query
importPackage(Packages.org.outerj.daisy.repository);
importClass(Packages.org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager);
importPackage(Packages.java.util);

var repositoryManager = new RemoteRepositoryManager("http://localhost:9263",
                                                    new Credentials("guest", "guest"));
var repository = repositoryManager.getRepository(new Credentials("testuser", "testuser"));
var queryManager = repository.getQueryManager();

var searchresults = queryManager.performQuery("select id, name where true", Locale.getDefault());
var rows = searchresults.getSearchResult().getRows().getRowArray();
for (var i = 0; i < rows.length; i++) {
  print(rows[i].getValueArray(0) + " : " + rows[i].getValueArray(1));
}

print("Total number: " + rows.length);
4.15.2.5.3 Creating a user
importPackage(Packages.org.outerj.daisy.repository);
importClass(Packages.org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager);

var repositoryManager = new RemoteRepositoryManager("http://localhost:9263",
                                                    new Credentials("guest", "guest"));
var repository = repositoryManager.getRepository(new Credentials("testuser", "testuser"));

// With the UserManager we can manage users and roles
var userManager = repository.getUserManager();

// Get references to some roles, we'll need them in a moment
var guestRole = userManager.getRole("guest", false);
var adminRole = userManager.getRole("Administrator", false);

// check if current user has admin role, and exit if not
var me  = userManager.getUser(repository.getUserId(), false);
var adminRoleId = adminRole.getId();
if (!me.hasRole(adminRoleId))
{
  print ("current user lacks admin rights to add new user. ");
  exit;
}
repository.switchRole(adminRoleId);

// Create the new user
var newUser = userManager.createUser("john");

// The user needs to be added to at least one role
newUser.addToRole(guestRole);
newUser.addToRole(adminRole);

// Optionally, set a default role which will be active after
// logging in. If not set, all roles (with the exception of
// the Administrator role) will be active on login
// newUser.setDefaultRole(guestRole);

// Password is required when using Daisy's built-in authentication scheme
newUser.setPassword("boe");

// Alternatively, set another authentication scheme:
// newUser.setAuthenticationScheme("my-ldap");

// Optional things
newUser.setFirstName("John");
newUser.setLastName("Johnson");

// Setting updateableByUser will allow the user to access the
// user settings page in the Wiki, so that the user can
// update here e-mail
newUser.setUpdateableByUser(true);

newUser.save();
4.15.2.5.4 Changing the password of an existing user
importPackage(Packages.org.outerj.daisy.repository);
importPackage(Packages.org.outerj.daisy.repository.user);
importClass(Packages.org.outerj.daisy.repository.clientimpl.RemoteRepositoryManager);

var repositoryManager = new RemoteRepositoryManager("http://localhost:9263",
                                                    new Credentials("guest", "guest"));
var repository = repositoryManager.getRepository(new Credentials("testuser", "testuser"));
repository.switchRole(Role.ADMINISTRATOR);

var userManager = repository.getUserManager();
var user = userManager.getUser("someuser", true);
user.setPassword("somepwd");
user.save();
4.15.2.5.5 Workflow samples

See Workflow Java API.

4.15.2.5.6 Your example here

If you've got a cool example to contribute, just write to the Daisy mailing list.

4.15.3 HTTP API

4.15.3.1 Introduction

Daisy contains a HTTP+XML interface, which is an interface to talk to the repository server by exchanging XML messages over the HTTP protocol. This interface offers full access to all functionality of the repository.

The HTTP protocol is a protocol that allows to perform a limited number of methods (Daisy uses GET, POST and DELETE) on an unlimited number of resources, which are identified by URIs. The GET method is used to retrieve a representation of the addressed resource, POST to trigger a process that modifies the addressed resource, and DELETE to delete a resource.

With HTTP, all calls are independent of each other, there is no session with the server.

The Daisy HTTP interface listens by default on port 9263. You can easily try it out, for example if Daisy is running on your localhost, just enter the URL below in the location bar of the browser, and press enter. The browser will then send a GET request to the server. The example given here is a request to execute a query (written in the Daisy Query Language). This request doesn't require an XML payload, all parameters are specified as part of the URL. Note that spaces in an URL must be encoded with a plus symbol.

http://localhost:9263/repository/query?q=select+id,name+where+true&locale=en

The browser will ask a user name and password, enter your Daisy repository username and password (e.g., the one you otherwise use to log in on the Daisy Wiki), or use the user name "guest" and password "guest" (only works if you installed the Daisy Wiki). The browser will show the XML response received from the server (in some browsers, you might need to do "view source" to see it).

Not all operations can be performed as easily as the above example: some require POST or DELETE as method, some require an XML document in the body of the request, and some even require a multipart-formatted request body (the document create and update operations, which need to upload the binary part data next to the XML message). If you have a programming language with a decent HTTP client library, none of this should be a problem.

4.15.3.2 Authentication

All requests require authentication. Authentication is done using BASIC authentication.

If you want to log in as another role then the default role of a user, append "@<roleid>" to the login (without the quotes). Note that it must be the id of the role, not its name. For example, if your default role is not Administrator (ID: 1), but you would like to perform the request as Administrator, and your login is "jules", you would use "jules@1". When the login itself contains an @-symbol, it must be escaped by doubling it (i.e. each @ should be replaced with @@). Multiple active roles can be specified using a comma-separated list, e.g. "jules@1,105".

4.15.3.3 Robustness

The current implementation doesn't do (many) checks on the XMLs posted as part of a HTTP request. This means that for example missing elements or attributes might simply cause little-descriptive (but harmless) "NullPointerExceptions" to occur.

The reason for this is that we use the HTTP API mostly via the repository Java client, which generates valid messages for us.

Since the XML posted to a resource is usually the same as the XML retrieved via GET on the same resource, it is easy to get examples of correct XML messages. XML Schemas are also available (see further on), though being schema-valid doesn't necessarily imply the message is correct.

4.15.3.4 Error handling

If a response was handled correctly, the server will answer with HTTP status code 200 (OK). If the status code has another value, it means something went wrong.

For errors generated explicitly, or when a Java exception occurs, an XML message is created describing the exception, and is returned with a status code 202 (Accepted). The XML message consists of an <error> root element, with as child either a <description> element or a <cause> element. The <description> element contains a simple string describing the error. The <cause> element is used in case a Java exception was handled, and contains further elements describing the exception (including stacktrace), and can include <cause> elements recursively describing the "causing" exceptions of that exception. To see an example of this, simply do a request for a non-existing resource, e.g.:

http://localhost:9263/repository/document/99999999

(assuming there is no document with ID 99999999)

When executing a method (GET, POST, DELETE, ...) on a resource that doesn't support that method you will get status code 405 (Method Not Allowed).

Incorrect or missing authentication information will give status code 401 (Unauthorized).

Missing request parameters, or invalid ones (e.g. giving a string where a number was expected) will give status code 400 (Bad Request).

Doing a request for a non-existing resource will give status code 404 (Not Found)

4.15.3.5 Intro to the reference

The rest of this document describes the available URLs, the operations that can be performed upon them, and the format of the XML messages. The descriptions can be dense, the current goal of this document is just to give a broad overview, more details might be added later. You can always ask for more information on the Daisy Mailing List.

You can also investigate how things are supposed to work by monitoring the HTTP traffic between the Daisy Wiki and the Daisy Repository Server.

Sometimes XML Schema files are referenced, these can be found in the Daisy source distribution.

4.15.3.6 Core Repository Interface

4.15.3.6.1 Documents

On many document-related resources, request parameters called branch and language can be added (this will be mentioned in each case). The value of these parameters can be either a name or ID of a branch or language. If not specified, the branch "main" and the language "default" are assumed.

4.15.3.6.1.1 /repository/document

This resource represents the set of all documents. GET is not supported on this resource (you can retrieve a list of all documents using a query).

POST on this resource is used to create a new document, which also implies the creation of a document variant, since a document cannot exist without a document variant. The payload should be a multipart request having one multipartrequest-part (we use this long name to distinguish with Daisy's document parts) containing the XML description of the new document, and other multipartrequest-parts containing the content of the document parts (if any). The multipartrequest-part containing the XML should be called "xml", and should conform to the document.xsd schema. The part elements in the XML should have dataRef attributes whose value is the name of the multipartrequest-part containing the data for that part.

The server will return the XML description of the newly created document as result. This XML will, among other things, have the id attribute completed with the ID of the new document.

4.15.3.6.1.1.1 Example scenario: creating a new document

This example illustrates how to create a new document in the repository over the HTTP interface using the curl tool. Curl is a handy command-line tool to do HTTP (and other) requests, and is standard available on many Linux distributions (it exists for Windows too).

Suppose we want to create a new document of type 'SimpleDocument' (as used in the Daisy Wiki), with the part 'SimpleDocumentContent'. We start by creating the XML description of the document, and save it in a file called newdoc.xml:

<?xml version="1.0" encoding="UTF-8"?>
<ns:document
  xmlns:ns="http://outerx.org/daisy/1.0"
  name="My test doc"
  typeId="2"
  owner="3"
  validateOnSave="true"
  newVersionState="publish"
  retired="false"
  private="false"
  branchId="1"
  languageId="1"
  >
  <ns:customFields/>
  <ns:collectionIds>
    <ns:collectionId>1</ns:collectionId>
  </ns:collectionIds>
  <ns:fields/>
  <ns:parts>
    <ns:part mimeType="text/xml" typeId="2" dataRef="data1"/>
  </ns:parts>
  <ns:links/>
</ns:document>

Some items in the above XML will need to be changed for your installation:

Now we need to create a file containing the content of the part we're going to add. For example create a file called 'mynewfile.xml' and put the following in it:

<html>
  <body>

    <h1>Hi there!</h1>

    <p>This is a test document.</p>

  </body>
</html>

Finally we are ready to create the document using curl:

curl --basic
     --user testuser:testuser
     --form xml=@newdoc.xml
     --form data1=@mynewfile.xml
     http://localhost:9263/repository/document

You need to enter all arguments on one line of course, and change user, password and server URLs as appropriate for your installation. Note that the form parameter 'data1' corresponds to the dataRef attribute in the newdoc.xml file (you can choose any name you want for these, if you have multiple parts use different names)

4.15.3.6.1.2 /repository/document/<id>

<id> should be replaced with the ID of an existing document.

4.15.3.6.1.2.1 Retrieving a document

GET on this resource retrieves an XML description of a document, with a certain variant of the document. The XML will contain the data of the most recent version of the document variant. The (binary) part data is not embedded in the XML, but must be retrieved separately using the following URL (described further on):

/repository/document/<document-id>/version/<version-id>/part/<parttype-id>/data

To specify the document variant, add the optional request parameters branch and language.

4.15.3.6.1.2.2 Creating a document or adding a document variant

POST on this resource is used to update a document (and/or document variant), or to add a new variant to it. When adding a new variant there are two possibilities: initialise the new variant with the content of an existing variant, or create a new variant from scratch. We now describe these three distinct cases.

To update an existing document (document variant), the format of the POST is similar as when creating a document, that is, it should contain a multipart-format body. The XML in this case should be an updated copy of the XML retrieved via the GET on this resource. Unmodified parts don't need to be uploaded again.

To create a new variant from scratch, again the POST data is similar as when creating a new document. In addition, three request parameters must be specified:

Although the variant is created from scratch, it is only possible to add a new variant to a document if you have at least read access to an existing variant. The new variant to be created is specified by the branchId and languageId attributes within the posted XML.

Creating a new variant based on an existing variant is rather different. In this case no XML body or multipart-request must be done, but a POST operation with the following request parameters:

These parameter names explain themselves I think. The branches and languages can be specified either by name or ID.

4.15.3.6.1.2.3 Deleting a document or a document variant

DELETE on this resource permanently deletes the document. This will delete the document and all its variants.

To delete only one variant of the document, specify the request parameters branch and language.

4.15.3.6.1.3 /repository/document/<id>/version

GET on this resource returns a list of all versions in a document variant as XML. For each version, only some basic information is included (the things typically needed to show a version overview page).

To specify the document variant, add the optional request parameters branch and language.

4.15.3.6.1.4 /repository/document/<id>/version/<id>

GET on this resource returns the full XML description of this version. As when requesting a document, the actual binary part data is not embedded in the XML but has to be retrieved separately.

POST on this resource is used to modify the version state (which is the only thing of a version that can be modified, other then that, versions are read-only once created). The request should have two parameters:

For both the GET and POST methods, add the optional request parameters branch and language to specify the variant.

4.15.3.6.1.5 /repository/document/<id>/version/<id>/part/<id>/data

GET on this resource retrieves the data of a part in a certain version of a document. The meaning of the <id>'s is as follows:

  1. The first <id> is the document ID
  2. The second <id> the version ID (1, 2, 3, ...) or the strings "live" or "last" to signify the live or last version
  3. The third <id> is the part type ID or the part type name of the part to be retrieved (if the first character is a digit, it is supposed to be an ID. Part type names cannot begin with a digit).

To specify the document variant, add the optional request parameters branch and language.

On the response, the HTTP headers Last-Modified, Content-Length and Content-Type will be specified.

4.15.3.6.1.6 /repository/document/<id>/lock

See the lock.xsd file for the XML Schema of the XML used to interact with this resource.

GET on this resource returns information about the lock, if any.

POST on this resource is used to create a lock. In this case, all attributes in the XML must have a value except for the hasLock attribute.

DELETE on this resource is used to remove the lock (if any). No request body is required.

For all three methods, the returned result is the XML description of the lock after the performed operation (possibly describing that there is no lock).

A lock applies to a certain variant of a document. To specify the document variant, add the optional request parameters branch and language.

4.15.3.6.1.7 /repository/document/<id>/comment

See comment.xsd for the XML Schema of the messages.

GET on this resource returns the list of comments for a document variant. To specify the document variant, add the optional request parameters branch and language.

POST on this resource creates a new comment. The branch and language are in this case specified in the XML message.

4.15.3.6.1.8 /repository/document/<id>/comment/<id>

DELETE on this resource deletes a comment. The second <id> is the ID of the comment. To specify the document variant, add the optional request parameters branch and language.

Other methods are not supported on this resource.

4.15.3.6.1.9 /repository/document/<id>/availableVariants

A GET on this resource returns the list of variants that exist for this document.

4.15.3.6.2 Schema Management
4.15.3.6.2.1 /repository/schema/(part|field|document)Type

These resources represent the set of part, field and document types.

POST to these resources is used to create a new part, field or document type. The request body should contain an XML message conforming to the schemas found in fieldtype.xsd, parttype.xsd or documenttype.xsd.

4.15.3.6.2.1.1 Example scenario

This example illustrates how to create a new field type (with a selection list), simply by using the well-known "wget" tool.

This is the XML that we'll send to the server:

<?xml version="1.0"?>
<fieldType name="myNewField" valueType="string" deprecated="false"
           aclAllowed="false" size="0"
           xmlns="http://outerx.org/daisy/1.0">
  <labels>
    <label locale="">My New Field</label>
  </labels>
  <descriptions>
    <description locale="">This is a test field</description>
  </descriptions>
  <selectionList>
    <staticSelectionList>  
      <listItem>
        <labels/>
        <string>value 1</string>
      </listItem>
      <listItem>
        <labels/>
        <string>value 2</string>
      </listItem>
    </staticSelectionList>
  </selectionList>
</fieldType>

Let's say we save this in a file called newfieldtype.xml. We can then create the field by executing:

wget --post-file=newfieldtype.xml
     --http-user=testuser@1
     --http-passwd=testuser
     http://localhost:9263/repository/schema/fieldType

This supposes that "testuser" exists and has the Administrator role, which is required for creating field types.

Wget will save the response from the server in a file called "fieldType". The response is the same XML but now with some additional attributes such as the assigned ID. The response XML isn't pretty formatted, if you have libxml installed you can view it pretty using:

xmllint --format fieldType
4.15.3.6.2.2 /repository/schema/(part|field|document)Type/<id>

GET on these resources retrieves the XML representation of a part, field or document type.

POST on these resources updates a part, field or document type. The request body should then contain an altered variant of the XML retrieved via GET.

DELETE on these resources deletes them. Note that deleting types is only possible if they are not in use any more by any version of any document.

4.15.3.6.2.2.1 Example scenario

Let's take the previous field type example again, and add an additional value to the selection list. We first retrieve the XML for the field type (check the XML response of the previous sample to know the ID of the created field type):

wget http://localhost:9263/repository/schema/fieldType/1

This will save a file called "1" (if that was the requested ID). To make it easier to work with, do a:

xmllint --format 1 > updatedfieldtype.xml

and change the updatedfieldtype.xml with an additional value in the selection list:

<?xml version="1.0" encoding="UTF-8"?>
<ns:fieldType xmlns:ns="http://outerx.org/daisy/1.0" size="0" updateCount="1"
  aclAllowed="false" deprecated="false" valueType="string"
  name="myNewField" lastModifier="3"
  lastModified="2004-09-09T09:06:51.032+02:00" id="1">
  <ns:labels>
    <ns:label locale="">My New Field</ns:label>
  </ns:labels>
  <ns:descriptions>
    <ns:description locale="">This is a test field</ns:description>
  </ns:descriptions>
  <ns:selectionList>
    <ns:staticSelectionList>
      <ns:listItem>
        <ns:labels/>
        <ns:string>value 1</ns:string>
      </ns:listItem>
      <ns:listItem>
        <ns:labels/>
        <ns:string>value 2</ns:string>
      </ns:listItem>
      <ns:listItem>
        <ns:labels/>
        <ns:string>value 3</ns:string>
      </ns:listItem>
    </ns:staticSelectionList>
  </ns:selectionList>
</ns:fieldType>

And then do:

wget --post-file=updatedfieldtype.xml
     --http-user=testuser@1
     --http-passwd=testuser
     http://localhost:9263/repository/schema/fieldType/1
4.15.3.6.2.3 /repository/schema/(part|field|document)TypeByName/<name>

GET on this resource retrieves a part, field or document type by its name.

You cannot POST on this resource, to update the type, use the previous (ID-based) resource.

4.15.3.6.2.4 /repository/schema/fieldType/<id>/selectionListData

GET on this resource retrieves the data of the selection list of the field type. If the field type does not have a selection list, an error will be returned.

When retrieving the XML representation of a field type, the selection list contained therein is the definition of the selection list, which can e.g. be a query. This resource instead returns the "expanded" selection list data, i.e. with queries executed etc.

Request parameters:

The branch and language parameters are not always important, they are used in case of selection lists that filter their items according to the branch and language.

4.15.3.6.3 Access Control Management
4.15.3.6.3.1 /repository/acl/<id>

...

4.15.3.6.3.2 /repository/filterDocumentTypes

...

4.15.3.6.4 Collection Management
4.15.3.6.4.1 /repository/collection/<id>

...

4.15.3.6.4.2 /repository/collectionByName/<name>

...

4.15.3.6.4.3 /repository/collection

...

4.15.3.6.5 User Management
4.15.3.6.5.1 /repository/user

...

4.15.3.6.5.2 /repository/role

...

4.15.3.6.5.3 /repository/user/<id>

...

4.15.3.6.5.4 /repository/role/<id>

...

4.15.3.6.5.5 /repository/userByLogin/<login>

...

4.15.3.6.5.6 /repository/roleByName/<name>

...

4.15.3.6.5.7 /repository/usersByEmail/<email>

...

4.15.3.6.5.8 /repository/userIds

...

4.15.3.6.5.9 /repository/publicUserInfo

...

4.15.3.6.5.10 /repository/publicUserInfo/<id>

...

4.15.3.6.5.11 /repository/publicUserInfoByLogin/<login>

...

4.15.3.6.6 Variant Management
4.15.3.6.6.1 /repository/branch

...

4.15.3.6.6.2 /repository/branch/<id>

...

4.15.3.6.6.3 /repository/branchByName/<name>

...

4.15.3.6.6.4 /repository/language

...

4.15.3.6.6.5 /repository/language/<id>

...

4.15.3.6.6.6 /repository/languageByName/<name>

...

4.15.3.6.7 Querying
4.15.3.6.7.1 /repository/query

GET on this resource is used to perform queries using the Daisy Query Language.

Required parameters:

Optional parameters:

4.15.3.6.7.2 /repository/facetedQuery

Used to perform a query for which the result contains the distinct values for the different items returned by the query. This allows to build a "faceted navigation" front end.

The query parameters are specified in an XML document which should be posted to this resource. The format of the XML is defined in facetedquery.xsd.

4.15.3.6.8 Namespace management
4.15.3.6.8.1 /repository/namespace

Use GET to get a list of all namespaces registered in this repository.

To create a namespace, use POST with parameters name and fingerprint (no XML body). Fingerprint is optional, if not specified the repository server will generate a fingerprint.

4.15.3.6.8.2 /repository/namespace/<id>

Use GET to retrieve information about a namespace, use DELETE to unregister a namespace.

The <id> is the internal ID of this namespace, which is repository-specific. It is more common to use /repository/namespaceByName

4.15.3.6.8.3 /repository/namespaceByName/<name>

Use GET to retrieve information about a namespace, use DELETE to unregister a namespace.

4.15.3.6.9 Other
4.15.3.6.9.1 /repository/userinfo

GET on this resource returns some information about the authenticated user. Takes no parameters.

4.15.3.6.9.2 /repository/comments

Usually comments are retrieved via the document they belong to, but it is also possible to get all comments of a user by doing a GET on this resource. Parameters:

4.15.3.7 Navigation Manager Extension

See also navigation.

In Daisy 2.0, the URLs for the navigation manager changed a little bit (only the URL, not their implementation), to be consistent with the "/namespace/*" format. However, the old URLs stay supported for now. It is recommended to change them to the new ones though: /navigation -> /navigation/tree, /navigationLookup -> /navigation/lookup, /navigationPreview -> /navigation/preview

/navigation/tree

GET on this resource retrieves a navigation tree (customised for the authenticated user).

Parameters, all required unless indicated otherwise:

/navigation/preview

This resource allows to generate a navigation tree from a navigation source description specified as part of the request. This is used in the Daisy Wiki application to try out navigation trees before saving them.

Parameters:

/navigation/lookup

Resolves a path against the navigation tree and returns the result of that lookup as an XML message. For more details, see the Java API (e.g. the class NavigationLookupResult).

Required parameters:

4.15.3.8 Publisher Extension

The purpose of the Publisher Extension component is to return in one call all the data needed to publish pages in the Daisy Wiki (or other front end applications).

/publisher/request

To this resource you can POST a publisher request. A publisher request takes the form of an XML document, and is described in detail over here.

/publisher/blob

GET on this resource retrieves a blob (the data of a part).

Required parameters:

The last modified, content type and content length headers are set.

4.15.3.9 Email Notifier Extension

The Email Notifier extension component makes available resources for managing the email subscriptions.

/emailnotifier/subscription/<id>

<id> is the ID of a user.

GET, POST, DELETE supported. XML Schema see subscription.xsd

/emailnotifier/subscription

GET on this resource returns all the subscriptions.

/emailnotifier/(document|schema|user|acl|collection)EventsSubscribers

GET on this resource returns all subscribers for the kind of event as specified in the request path. The returned information for each subscriber includes the user ID and the locale for the subscription.

In the case of documentEventSubscribers, the following additional request parameters are required:

The returned subscribers are then those that are explicitly subscribed for changes to that document, or those who are subscribed to a collection to which the document belongs, or those that are subscribed to all collections.

/emailnotifier/documentSubscription/<documentId>

This resource allows to manage document-based subscriptions without having to go through the full subscriptions.

Using POST on this resource allows to add or remove a subscription for the specified document for some user. Required parameters:

Using DELETE on this resource removes the subscriptions for the specified document for all users. This is useful to cleanup subscriptions when a document gets deleted. If the branch and language parameters are missing, the subscriptions for all variants fo the document will be removed, otherwise only for the specified variant.

/emailnotifier/documentSubscription/<documentId>/<userId>

This resource allows to quickly check if a user is subscribed for notifications to a certain document (using GET). Request parameters branch and language are required, specifying the ID of the branch and language.

/emailnotifier/collectionSubscription/<collectionId>

Only DELETE is supported on this resource, and deletes all subscriptions for the specified collection for all users.

4.15.3.10 Emailer Extension

The emailer extension allows to send emails. Only Administrators can do this.

/emailer/mail

A POST to this resource will send an email, the following parameters are required:

4.15.3.11 Document Task Manager Extension

See also Document Task Manager.

/doctaskrunner/task

A GET on this resource retrieves all existing tasks for the current user, or the tasks of all users if the role is Administrator.

A POST on this resource is used to create a new task, in which case the body must contain an XML document describing the task (see also taskdescription.xsd).

/doctaskrunner/task/<id>

A GET on this resource retrieves information about a task.

A POST on this resource in combination with a request parameter "action" with value "interrupt" interrupts a task.

A DELETE on this resource deletes the persistent information about this task.

/doctaskrunner/task/<id>/docdetails

A GET on this resource retrieves detailed information about the execution of the task on the documents.

4.16 Extending the repository

This section contains information for people who want to plug in custom Java-based components in the repository server.

4.16.1 Repository plugins

Daisy provides a number of interfaces through which the repository functionality can be extended or customized. The components that do this are called plugins.

4.16.1.1 Anatomy of a plugin

A plugin is basically an implementation of a certain plugin interface. The various plugin interfaces are listed further on.

To deploy the plugin in the repository server, it should be packaged as a container jar for the Daisy Runtime. A container jar is a jar file containing a Spring container definition.

The Spring container definition should contain a bean which (usually upon initialization) registers the plugin implementation with the plugin registry. The bean which performs the registration can simply be the plugin implementation itself.

To gain access to the plugin registry, the Spring container can use the daisy:import-service instruction (a custom tag provided by the Daisy Runtime).

When the Spring container of the plugin is destroyed, it should properly unregister the plugin.

To deploy the plugin, the container jar can be copied to a specific subdirectory of the repository data directory.

This may sound like a lot, but it's not, as is illustrated by this example.

4.16.1.2 Plugin types

4.16.1.2.1 Extensions

Repository extensions are very generic plugins which can implement any sort of functionality. The main advantage of putting this functionality in the form of a repository extension is that the extension can then be easily retrieved using the Repository.getExtension(name) function.

Various non-core functionality in Daisy has been added as extensions, such as the NavigationManager, the Publisher, the EmailNotifier, etc.

For details see repository extensions.

4.16.1.2.2 Pre-save hooks

A pre-save hook is a component which can modify the content of a document right before it is saved. An example is the image pre-save hook included with Daisy, which can generate thumbnails in document parts and extract EXIF data to document fields.

A pre-save hook should implement the following interface:

org.outerj.daisy.repository.spi.local.PreSaveHook

4.16.1.2.3 Authentication schemes

The task of an authentication scheme is to tell if a username/password combination is valid. By default Daisy uses its own authentication based on Daisy's user management. New authentication schemes can be added to check against external systems. Daisy ships with example authentication schemes for LDAP and NTLM which are usable through simple configuration. If you have other needs, you can implement your own scheme. See also user management and an example.

An authentication scheme should implement the following interface:

org.outerj.daisy.authentication.spi.AuthenticationScheme

4.16.1.2.4 Text extractors

Text extractors extract text from various content formats for the purpose of full text indexing. Daisy includes a variety of such text extractors, e.g. for MS Word and PDF.

A text extractor should implement the following interface:

org.outerj.daisy.textextraction.TextExtractor

4.16.1.2.5 Link extractors

Link extractors extract Daisy document links from various content formats. These extracted links are maintained by the repository to enable searching which documents link to a certain document.

A link extractor should implement the following interface:

org.outerj.daisy.linkextraction.LinkExtractor

4.16.1.2.6 HTTP handlers

Adding new functionality to the HTTP interface of the repository server can be done by implementing a request handler:

org.outerj.daisy.httpconnector.spi.RequestHandler

This is mostly useful for extensions which want to support remote invocation.

4.16.1.2.7 Other

It's possible to add it any sort of component to be launched as part of the repository server, it doesn't necessarily need to register something in the plugin registry. Such components could perform all sorts of tasks such as listening to JMS or synchronous repository events, performing timed actions, etc.

4.16.1.3 Plugin registry

To make plugins available, you need to register them with a service called the PluginRegistry.

When registering a plugin, you need to specify the following:

For example, a text extractor is registered like this:

import org.outerj.daisy.textextraction.TextExtractor;
...

TextExtractor extractor = new MyTextExtractor();

pluginRegistry.addPlugin(TextExtractor.class, "my text extractor", extractor);

To gain access to the PluginRegistry, use daisy:import-service to import it into your spring container:

<beans ...

  <daisy:import-service id="pluginRegistry" service="org.outerj.daisy.plugin.PluginRegistry"/>

For a complete example, see the authentication scheme sample.

It is recommended to unregister the plugin when shutting down.

That's all you need to know about registering plugins. It is the responsibility of the PluginRegistry and the user of the plugins to handle the rest.

4.16.1.4 Deploying plugins

As explained earlier, a plugin should be packaged as a container jar.

The Daisy Runtime configuration for the repository server will automatically include container jars put in the following directories:

<daisy data dir>/plugins/load-before-repository

<daisy data dir>/plugins/load-after-repository

To see how these are included in the runtime configuration, have a look at <DAISY_HOME>/repository-server/conf/runtime-config.xml

As explained in the Daisy Runtime documentation, the container jars put in these directories will be loaded in alphabetical order.

The reason to have two directories, load-before-repository and load-after-repository, is as follows:

For some plugins, it is desirable to register them before the core repository is started because they modify the behavior of the repository. For example, take text extractors (which extract text for the purpose of fulltext indexing). From the moment the repository is started, it can start doing work. For example it might receive a JMS event of an updated document and fulltext-index it. If we would only register text extractor plugins after the repository is started, there will be a (small) amount of time during which the repository server will try to index documents without these additional text extractors being available. Hence documents handled during this period would be treated differently.

Other plugins might be depended on the repository being available (in Daisy Runtime speak: they import services exported by the repository container) , and hence can only be loaded after the repository is available.

In general, plugins which modify the behaviour of the repository server (text extractors, pre-save hooks, authentication schemes, etc.) should be put in the load-before-repository directory.

After copying the new plugin(s) into the appropriate directory, you need to restart the repository server for the plugin to be loaded.

4.16.1.5 Repository Extensions

Repository Extensions are a particular type of plugins that add extra functionality to the repository. An Extension is usually related to the repository, e.g. because it needs access to information in the repository. The extension code has no special privileges, it simply makes use of the repository using the normal APIs and SPIs.

In fact, since an extension is simply an application which makes use of the repository via its API, one could wonder why there is a need for adding this code as an extension to the repository. The reasons are:

Extensions registered with the repository can be retrieved like this:

MyExtension myExtension = (MyExtension)repository.getExtension("name-of-the-extension");

whereby MyExtension is the interface of the extension.

Examples of extensions delivered with Daisy: the NavigationManager, the EmailSubscriptionManager, the Emailer, the Publisher and the DocumentTaskManager.

To register in an extension you should register an implementation of this interface:

org.outerj.daisy.repository.spi.ExtensionProvider

and add it to the plugin registry.

4.16.1.6 Sample: custom authentication scheme

As an example of how to build a repository plugin, here we will look at how to implement an authentication scheme. Other plugins follow a similar approach.

For the purpose of this example, we'll create a FixedPasswordAuthenticationScheme, i.e. an authentication scheme which accepts just one fixed password regardless of the user.

Components to be deployed in the repository server need to be packaged as a container jar, this is a normal jar file containing at least one Spring bean container definition. This is described in detail in the Daisy Runtime documentation.

4.16.1.6.1 The container jar

The container jar we're going to build will have the following structure:

org
  foobar
    FixedPasswordAuthenticationScheme.class
DAISY-INF
  spring
    applicationContext.xml

So we only need to create two files.

4.16.1.6.2 Implement the necessary code

For a custom authentication scheme, we need to do two things:

We do this here with one class, shown below. Save this code in a file called FixedPasswordAuthenticationScheme.java

package org.foobar;

import org.outerj.daisy.authentication.AuthenticationScheme;
import org.outerj.daisy.authentication.AuthenticationException;
import org.outerj.daisy.plugin.PluginRegistry;
import org.outerj.daisy.repository.Credentials;
import org.outerj.daisy.repository.user.User;
import org.outerj.daisy.repository.user.UserManager;
import org.apache.avalon.framework.configuration.Configuration;

public class FixedPasswordAuthenticationScheme implements AuthenticationScheme {
    private String name;
    private String description;
    private String password;
    private PluginRegistry pluginRegistry;

    public FixedPasswordAuthenticationScheme(Configuration config,
            PluginRegistry pluginRegistry) throws Exception {
        password = config.getChild("password").getValue();
        name = config.getChild("name").getValue();
        description = config.getChild("description").getValue();
        this.pluginRegistry = pluginRegistry;
        pluginRegistry.addPlugin(AuthenticationScheme.class, name, this);
        System.out.println("Scheme " + name + " added!");
    }

    public void destroy() {
        pluginRegistry.removePlugin(AuthenticationScheme.class, name, this);
    }

    public String getName() {
        return name;
    }

    public String getDescription() {
        return description;
    }

    public boolean check(Credentials credentials) throws AuthenticationException {
        // this is the actual password check, very simple in this case
        return password.equals(credentials.getPassword());
    }

    public void clearCaches() {
        // we have nothing cached
    }

    public User createUser(Credentials crendentials, UserManager userManager) throws AuthenticationException {
        // unsupported
        return null;
    }
}

This file can be compiled like this:

Use the method of your choice to compile the code (Ant, Maven, your IDE, ...). When using the command below, we hope you know enough about this that everything should be on one line. For Windows, replace $DAISY_HOME with %DAISY_HOME% and the colons with semicolons)

javac -classpath
          $DAISY_HOME/lib/daisy/jars/daisy-repository-api-<version>.jar:
          $DAISY_HOME/lib/daisy/jars/daisy-repository-server-spi-<version>.jar:
          $DAISY_HOME/lib/daisy/jars/daisy-pluginregistry-api-<version>jar:
          $DAISY_HOME/lib/avalon-framework/jars/avalon-framework-api-4.1.5.jar
      FixedPasswordAuthenticationScheme.java
4.16.1.6.3 Create a Spring bean container definition (applicationContext.xml)

The following is the Spring container definition.

<?xml version="1.0"?>
<beans              xmlns = "http://www.springframework.org/schema/beans"
                xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
              xmlns:daisy = "http://outerx.org/daisy/1.0#runtime-springext"
               xmlns:conf = "http://outerx.org/daisy/1.0#config-springext"
       xsi:schemaLocation = "http://www.springframework.org/schema/beans
                             http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
                             http://outerx.org/daisy/1.0#runtime-springext
                             http://daisycms.org/schemas/daisyruntime-springext.xsd
                             http://outerx.org/daisy/1.0#config-springext
                             http://daisycms.org/schemas/config-springext.xsd">

  <daisy:import-service id="configurationManager"
             service="org.outerj.daisy.configuration.ConfigurationManager"/>
  <daisy:import-service id="pluginRegistry"
             service="org.outerj.daisy.plugin.PluginRegistry"/>

  <bean id="foo" class="org.foobar.FixedPasswordAuthenticationScheme" destroy-method="destroy">
    <constructor-arg>
      <conf:configuration group="foobar" name="fixedpwd-auth" source="configurationManager">
        <conf:default xmlns="">
          <name>fixedpwd</name>
          <description>Fixed global password</description>
          <password>jamesbond</password>
        </conf:default>
      </conf:configuration>
    </constructor-arg>
    <constructor-arg ref="pluginRegistry"/>
  </bean>

</beans>

The special <daisy:import> elements are used to import services from other containers. You can assume these services will be available. See the Daisy Runtime for details on how this works.

The component configuration system is also explained elsewhere, for now it suffices to know that the content of the conf:default element will be supplied as default configuration to the component. This configuration could be customized through the myconfig.xml file.

4.16.1.6.4 Deploy the new authentication scheme

Create the jar file (which is a normal zip file) containing the FixedPasswordAuthenticationScheme.class and applicationContext.xml in the directory structure as outlined earlier.

$ find -type f
./org/foobar/FixedPasswordAuthenticationScheme.class
./DAISY-INF/spring/applicationContext.xml

$ jar cvf fixedpwd-auth.jar *

Then copy the jar file to the directory <repo data dir>/plugins/load-before-repository.

Now stop and start the repository server. If everything goes well, the authentication scheme will be loaded and the following line will be printed (to standard out if you're using the wrapper, the wrapper log file)

Scheme fixedpwd added!

If you go to the Administration console in the Daisy Wiki and create or edit a user, you will be able to select the new authentication scheme.

4.16.1.6.5 Follow-up notes

When doing this for real, you would of course use proper logging instead of System.out.println.

For those familiar with Spring, instead of constructor dependency injection, we could as well have used setter dependency injection.

If the implementation of the authentication scheme requires some code on the classpath, than this can be specified by adding a DAISY-INF/classloader.xml file to the container jar. See the Daisy Runtime documentation.

Happy hacking!

4.16.2 Daisy Runtime

4.16.2.1 What is it?

The Daisy Runtime is the platform upon which the repository server runs. Basically, it consists of a set of isolated Spring bean containers, with some infrastructure for setting up classloaders and for sharing services between the Spring containers. It's really just a thin layer around Spring. You could think of it as a “Spring-OSGI-light”.

Some background on how we arrived at this can be found in this blog entry.

4.16.2.2 How it works

The Daisy Runtime takes two important parameters for booting up:

In this documentation, the words “artifact” and “jar” basically mean the same thing.

Each container jar contains a file describing the the required classpath, and one or more Spring container definitions. The Daisy Runtime will start a separate Spring container corresponding to each container jar.

The details of the classloader setup are described further on.

Each container can export and import services to/from a common service registry. How this is done is also explained further on.

So once more, summarized: the purpose of the Daisy Runtime is starting Spring containers, setting up classloaders for them, and allowing them to share selected services.

4.16.2.3 The Runtime CLI

There are few ways to start the runtime:

For both cases, we also have a small launcher jar which has the advantage that you only need to add this launcher jar to your classpath, and it will then set up the classloader to boot the Daisy Runtime. In case of the programmatic access, you will of course need the jars for any API's you want to use in the current classloader.

4.16.2.4 The runtime config file

The runtime config file is an XML file listing the container jars to be started as part of the runtime.

Container jars can be specified in multiple ways

The following sample shows how to specify each of them in the configuration:

<?xml version="1.0"?>
<runtime>
  <containers>

    <artifact id="cont1" groupId="foo" artifactId="bar" version="2.1"/>

    <file id="cont2" path="/path/to/file.jar"/>

    <directory id="foo" path="/foo/bar"/>    

  </containers>
</runtime>

The id attribute values need to be unique.

The containers are started in the order as listed here. The order can be important for export and import of services, as described further on.

In the file and directory paths, you can insert system property using ${...} notation. For example:

<directory id="something" path="${user.home}${file.separator}/containers"/>

4.16.2.5 The container jar

A container jar is normal jar file containing a DAISY-INF directory with metadata for the Daisy Runtime.

The structure is as follows:

DAISY-INF
 + classloader.xml
 + spring
    + applicationContext.xml
    + ...
4.16.2.5.1 The DAISY-INF/spring directory

The spring directory can contain Spring container definitions in the form of XML files. There needs to be at least one. The files can have any name as long as it ends on .xml. Often applicationContext.xml is used.

These are standard Spring XML files, though we have some extension namespaces for the import and export of services, and for the configuration system.

As a template, to have all these namespaces declared, you might use the following:

<?xml version="1.0"?>
<beans              xmlns = "http://www.springframework.org/schema/beans"
                xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
              xmlns:daisy = "http://outerx.org/daisy/1.0#runtime-springext"
               xmlns:conf = "http://outerx.org/daisy/1.0#config-springext"
       xsi:schemaLocation = "http://www.springframework.org/schema/beans
                             http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
                             http://outerx.org/daisy/1.0#runtime-springext
                             http://daisycms.org/schemas/daisyruntime-springext.xsd
                             http://outerx.org/daisy/1.0#config-springext
                             http://daisycms.org/schemas/config-springext.xsd">

  <!-- Import and export sample syntax:
  <daisy:import-service id="" service="(interface FQN)"/>
  <daisy:export-service ref="" service="(interface FQN)"/>
  -->

  <!-- Insert bean definitions here -->
</beans>
4.16.2.5.2 The classloader.xml file

The classloader.xml file lists the artifacts that need to be in the classpath. The container jar itself is always automatically added as the first entry in the classpath. The classloader.xml file is optional.

As mentioned before, jars are referenced as Maven artifact references. The Daisy Runtime will search them in a Maven-style local repository specified at startup of the Runtime. The Runtime does not support automatic downloading of missing artifacts from remote repositories.

The syntax of the classloader.xml is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<classloader>
  <classpath>
    <artifact groupId="" artifactId="" version="" share="allowed|required|prohibited"/>
  </classpath>
</classloader>

This is pretty straightforward, except for the share attribute. Next to the classloader for each container, the Daisy Runtime also creates a common classloader that acts as the parent classloader for each of the container classloaders, as illustrated in the figure below.

By means of the share attribute, you can specify if an artifact can, should or should not be added to the common classloader. This is done using the following attribute values:

When starting up, the Daisy Runtime will first read the classloader configurations of all containers in order to determine what artifacts should be part to the common classloader and what artifacts should be in container-specific classloaders. When using the Runtime CLI, you can specify the command line option --classloader-log to see a report of this.

4.16.2.5.2.1 Enforcement of share=”prohibited”

Currently share=”prohibited” is not enforced. For example if one container has artifact A as share-allowed and another one has artifact A as share-prohibited, then artifact A will be added to the common classloader, and the Daisy Runtime will only print a warning.

4.16.2.5.2.2 Disabling artifact sharing

Since the share=”allowed” mode only indicates optional sharing, things should work just as well when these artifacts are not shared. The Daisy Runtime CLI supports the command line option to do this, --disable-class-sharing.

This can be useful to check that the classpaths of all containers list their required dependencies, and that things which should be common use the share=”required” mode.

4.16.2.5.2.3 Sharing/publishing classes requires to put them in a different jar

As a consequence of the how our system works, if you want to add classes to the common classloader they have to be in a separate jar. It is good practice to put APIs and implementation in separate jars, so often this is no problem. This is different from e.g. OSGI where exporting is done using Java packages rather than jars.

4.16.2.5.2.4 Dependencies between artifacts

If the system decides to put one shareable jar (jar A) in the shared classloader, and another shareable jar (jar B) not, and jar A is dependent on jar B, there might be problems since the classes in jar A won't find the classes in jar B. However, this is an unlikely situation to occur since then all containers should have both jar A and B as shareable jars.

4.16.2.5.2.5 The order of the entries in the common classloader

There is currently no control over the order of the entries in the common classloader.

4.16.2.5.2.6 Knowing more about classloaders

For using the Daisy Runtime, it suffices to have a very basic understanding of classloaders. If you are interested on learning everything about classloading, a good book is Component Development for the Java Platform.

4.16.2.6 Exporting and importing services

By default one container cannot access the “beans” in another container. The Daisy Runtime provides the ability to export specific beans as “services” and to import services exported by other containers.

This importing and exporting is done by custom elements in the Spring XML container definition.

4.16.2.6.1 Exporting a bean / service

When exporting a service, you need to specify the ID of the bean and the interface which you want to make available:

<daisy:export-service ref="mybeanid" service="org.mydomain.MyInterface"/>

If a bean implements multiple interfaces, and you want to make this all available as services, you need multiple exports for the same bean:

<daisy:export-service ref="mybeanid" service="org.mydomain.InterfaceA"/>
<daisy:export-service ref="mybeanid" service="org.mydomain.InterfaceB"/>

Note that the service must be an interface, it is not possible to export services using concrete classes.

4.16.2.6.1.1 Only one service of a certain type can be shared

The shared service registry is basically a map using the service interface as key. This means there can be only service per interface. An exception will be thrown if a second export for the same service interface is done.

4.16.2.6.2 Importing a service

To import a service, use the following syntax:

<daisy:import-service id="datasource" service="javax.sql.DataSource"/>

The id is the id for the bean in the local container. You will hence be able to pass the service to other beans using this id (as you would do for any other bean).

4.16.2.6.2.1 Order in which the containers are loaded

The order in which the container jars are specified to the Daisy Runtime is important, it needs to be such that the imports of a container are satisfied by exports done by earlier started containers.

4.16.2.6.2.2 Shielding of exported services

Daisy will not provide a direct pointer to the corresponding bean when importing a service. Rather, it creates a dynamic proxy implementing the service interface. This is to:

4.16.2.7 Daisy Runtime shutdown

When shutting down the Daisy Runtime, all Spring containers are destroyed in the reverse order that they were started.

4.16.2.8 Starting the Daisy Runtime using the CLI

The Daisy Runtime CLI provides a couple of useful options, use the -h (help) argument to see them.

For example, in the binary distribution the Daisy Runtime is started by the daisy-repository-server script:

cd $DAISY_HOME/repository-server/bin
daisy-repository-server -h

In development setup, this is

cd <source tree root>/repository/server
start-repository -h

4.16.3 Component configuration

4.16.3.1 The API

The current configuration system is an interim solution, mainly for backwards compatibility with our older runtime environment (Avalon Merlin). Nonetheless, it's not bad, though in the future we'll likely add more advanced features like runtime reloading and notification of configuration changes, and drop the use of Avalon-specific APIs.

For representing configuration information, we use, for historical reasons, the Avalon Configuration API. In this API, the configuration data is modeled as a tree of Configuration objects. Each Configuration object can have attributes and children. This structure maps quite well onto XML, though it doesn't support mixed content (= sibling text and element nodes). The Configuration API provides convenient methods for retrieving the data as various non-string types, e.g. Configuration.getAttributeAsInteger(name)

Components (beans) in need of configuration data don't read this directly from files, but retrieve it using a logical name from a component called the ConfigurationManager.

While beans could depend on the ConfigurationManager component, we rather not let them retrieve the information themselves, but supply them directly with the Configuration object. Also, sometimes we'd like to be able to specify default configuration information. For these purposes, we made a Spring extension.

4.16.3.1.1 Example

Suppose we have a class Foo like this:

import org.apache.avalon.framework.configuration.Configuration

public class Foo {
    public Foo(Configuration conf) {
       System.out.println(conf.getChild("message").getValue());
    }
}

We could now supply it with the Configuration object like this:

<beans              xmlns = "http://www.springframework.org/schema/beans"
              xmlns:daisy = "http://outerx.org/daisy/1.0#runtime-springext"
               xmlns:conf = "http://outerx.org/daisy/1.0#config-springext"
       xsi:schemaLocation = "http://www.springframework.org/schema/beans
                             http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
                             http://outerx.org/daisy/1.0#runtime-springext
                             http://daisycms.org/schemas/daisyruntime-springext.xsd
                             http://outerx.org/daisy/1.0#config-springext
                             http://daisycms.org/schemas/config-springext.xsd">

  <daisy:import-service id="configurationManager"
                         service="org.outerj.daisy.configuration.ConfigurationManager"/>

  <bean id="thing1" class="Foo">
    <constructor-arg>
      <conf:configuration group="mythings" name="foo" source="configurationManager"/>
    </constructor-arg>
  </bean>

</beans>

Note the following things:

Optionally, one can specify default configuration:

  <bean id="thing1" class="Foo">
    <constructor-arg>
      <conf:configuration group="mythings" name="foo" source="configurationManager">
        <conf:default xmlns="">
          <message>No message configured!</message>
        </conf:default>
      </conf:configuration>
    </constructor-arg>
  </bean>

4.16.3.2 Configuring the configuration

So where does the ConfigurationManager gets its configuration from? From an XML file, for a default repository installation this file is called myconfig.xml and can be found in the repository data directory.

The myconfig.xml looks like this:

<targets>
  <target path="mythings/foo">
    <configuration>
      <message>Hello world!</message>
    </configuration>
  </target>
</targets>

The format of this file is again, for compatibility reasons, the same as it was for Avalon Merlin, where the path attributes pointed to targets (components) for which the configuration was intended.

The path attribute contains the group/name specified when retrieving the configuration. There can be as many of these <target> elements as desired.

4.16.3.2.1 Backwards compatible paths

If you look in the actual myconfig.xml of the repository server, you'll see things like this:

<target path="/daisy/repository/blobstore">

This path attribute does not correspond to the group/name structure. This path attribute is however backwards compatibility with the old runtime. Supporting these old paths avoids the need for users to update their existing configuration files and (for us) to adjust utilities which read/update the configuration file.

The ConfigurationManager maintains a mapping of old paths to new paths, this mapping is used when reading the myconfig.xml.

4.16.3.3 Configuration merging

When a default configuration is specified (using conf:default), and there is also an actual configuration, both are merged by means of a CascadingConfiguration.

4.16.3.4 Further exploration

The sources of the configuration stuff can be found in the Daisy source tree at

services/configuration

4.16.4 Logging

4.16.4.1 Logging API

In the repository server all logging is performed using the commons-logging API.

It is then up to the environment in which the repository server is started to set up a concrete logging implementation.

In the Daisy Runtime CLI we replaced the commons-logging API by its clone jcl-over-slf4j, the actual logging engine is log4j.

4.16.4.2 Logging tips

The Daisy Runtime CLI has some handy options for logging, mostly intended for use during Daisy development:

So for example:

start-repository -l debug

will cause all debug log to be sent to the console (during startup this will be quite a lot). To only see logging produced by Daisy, one could do:

start-repository -l debug -m org.outerj.daisy

4.16.5 Launcher

The purpose of the launcher is to easily start the repository server without the need to add all the required implementation jars to the classpath of your project. When using the launcher, you only need the launcher jar on the classpath, and the launcher will then construct a classloader containing the required dependencies. This also means that you don't need to update classpaths if they change between Daisy versions.

More precisely, the launcher supports launching of 3 different things:

Except for the CLI, you usually start these things with the purpose of being able to talk to them. For this, you still need the required repository APIs in your classloader.

The launcher jar is also executable using java -jar, in which case it will start the Daisy Runtime CLI.

Some pointers to examples:

4.17 Repository Implementation

This section contains some information on internals of the repository server. It is mostly only relevant for people who want to hack on Daisy itself.

4.17.1 Repository Implementation

We have mentioned before that there are two implementations of the repository API: one we call local (the one in the repository server) and one we call remote. In this document we're going to look into how the repository objects are implemented to support both local and remote implementations.

4.17.1.1 Repository server implementation

The implementation of the repository can be found in the source tree in the repository directory. This is the structure of the repository directory:

+ repository
    + api
    + client
    + common
    + server
    + server-spi
    + spi
    + test
    + xmlschema-bindings

The api directory contains the repository API definition, these are mainly interfaces. Repository client applications only need to depend on the api jar and the xmlschema-bindings jar (thus when compiling a client program, only these jars need to be in the classpath).

The client directory contains the remote implementation, the server directory contains the local implementation. The common directory contains classes common to both implementations (this is discussed further on). The spi directory contains "Service Provider Interfaces", these are interfaces towards extension components. The server-spi directory is similar to the spi directory, but contains interfaces that are only relevant for the server implementation. The test directory contains functional tests. These tests automate the process of constructing an empty database, starting an embedded repository server, and then execute a JUnit test. The xmlschema-bindings directory contains XML schemas for the XML formats used by the repository, these are compiled into corresponding Java classes by use of XMLBeans. Logically speaking these classes are a part of the repository API.

Next to the repository directory, the services directory also contains a good amount of functionality that is used by the repository client or server. The subprojects in the service directory are however separated from the main repository code because either they are completely independent from it (and reusable and testable outside of the Daisy repository code), or it are repository extensions.

4.17.1.2 The local, remote and common implementations

If we consider an entity such as a Document, a User or even an Acl, there's a lot of the implementation of these interfaces that will be equal in the local and remote interface: in both cases they need instance variables to hold the data, and implementations of the various methods. In fact, for most of these entities, the only difference is the implementation of the save method. In the local implementation, the save method should update the database, while in the remote implementation, the save method should use an appropriate HTTP call to perform the operation.

Therefore, the basic implementation of these objects is separated out in the "common-impl" subproject, which delegates things that are specific for local or remote implementation to a certain Strategy interface. The Strategy interface is then implemented differently for the local and remote implementations.

The diagram below depicts this basic organisation of the code.

Not all operations of the repository are of course loading and saving entities, there are also items such as querying and ACL-checking, and these are also delegated by the common-impl to the appropriate strategy instance.

So practically speaking, we could say that most of the real work happens in the implementations of the Strategy interfaces. You can find the Strategy interfaces by going to the repository/common directory and searching for all *Strategy.java files.

4.17.1.3 User-specific and common objects

Lets do a quick review of the Daisy API, for example:

// Getting a repository instance
Repository repository = repositoryManager.getRepository(new Credentials("user", "password"));

// Retrieving a document
Document document = repository.getDocument(55, false);

// Creating a collection
CollectionManager collectionManager = repository.getCollectionManager();
DocumentCollection newCollection = collectionManager.createCollection("abc");
newCollection.save();

The RepositoryManager.getRepository() method returns a caller-specific Repository object. With "caller-specific", I mean a custom instance for the thread that called the getRepository() method. The Repository object remembers then the user it belongs too, so further methods don't need a credentials parameter.

If later on we do a call to repository.getCollectionManager(), the returned collectionManager instance is again caller-specific, thus it knows it represents the authenticated user and we don't need to specify credentials when calling further methods.

The implementations of interfaces like Repository (RepositoryImpl) and CollectionManager (CollectionManagerImpl) delegate internally the calls simply to a CommonRepository instance and a CommonCollectionManager instance, calling similar methods on them but with an additional user parameter. Thus RepositoryImpl and CollectionManagerImpl in a sense exist only to remember the authenticated user.

A diagram which illustrates all this is available here. (it's not embedded in the page because it is a bit wide).

As you'll see in the diagram, a call for getDocument on the repository delegates this call to the CommonRepository instance, which then in itself delegates the call to a LoadStoreStrategy class. If the CommonRepository class simply forwards the call to the LoadStoreStrategy class, you could wonder why this extra layer of delegation still exists and thus why the CommonRepository and its corresponding LoadStoreStrategy are not merged into one. Or in general, why this is not done for all *Manager classes (CollectionManager, AccessManager, UserManager, etc.) and their corresponding Common* classes. For a big part, this is because this is how it historically evolved, though the Common* classes still perform some functions such as caching which are (sometimes) shared between the local and remote implementations.

4.17.2 Database schema

The image below shows the database schema of the daisy repository. The actual content of parts is stored in files on the hard disk, the blob_id column in the parts table contains the filename (or more correctly, the id used by the BlobStore component to retrieve the data, but this is currently the same as the file name).

Never make changes to the database directly, always use the repository APIs.

5 Daisy Wiki

The Daisy Wiki is a generic web-based frontend to the repository server. It provides both publishing and editing/management features. Please see the feature overview page for a comprehensive introduction.

5.1 Daisy Wiki Sites

5.1.1 What is a Daisy Wiki "site"?

A Daisy Wiki site is a specific view on a Daisy Repository. A site is configured with a default collection (the concept of document collections is explained on the documents page). Full text searches and recent changes are automatically limited to only show documents from that default collection. New documents created via the site are by default assigned to that collection. Each site can have its own navigation tree, and is configured with a specific document as the homepage of the site.

A site is configured with a certain branch and language: for any document consulted via that site, the shown document variant will depend on that branch and language.

The Repository Server isn't aware of the concept of sites, nor does the site concept partition the repository in any way.

5.1.2 Defining sites

Sites are defined by creating a directory for the site and putting a siteconf.xml file in it. This directory should be created in the "sites" directory. By default, this sites directory is located at:

<wikidata directory>/sites

The location of this directory can be changed in the cocoon.xconf.

The content of the siteconf.xml file should strictly adhere to a certain schema (thus no extra elements/attributes are allowed), otherwise the site will be ignored (in that case, an error will be logged in Cocoon's log files).

5.1.2.1 siteconf.xml syntax

An example siteconf.xml is displayed below.

<siteconf xmlns="http://outerx.org/daisy/1.0#siteconf">
  <title>foobar</title>
  <description>The "foobar" site</description>
  <skin>default</skin>
  <navigationDocId>1-DSY</navigationDocId>
  <homepageDocId>2-DSY</homepageDocId>
  <!-- homepage>....</homepage -->
  <collectionId>1</collectionId>
  <!-- collectionName>myCollection</collectionName -->
  <contextualizedTree>false</contextualizedTree>
  <!-- navigationDepth>4</navigationDepth -->
  <branch>main</branch>
  <language>default</language>
  <defaultDocumentType>SimpleDocument</defaultDocumentType>
  <publisherRequestSet>default</publisherRequestSet>
  <siteSwitching mode="all"/>
  <newVersionStateDefault>publish</newVersionStateDefault>
  <locking>
    <automatic lockType='pessimistic' defaultTime='15' autoExtend='true'/>
  </locking>
  <!--
  <documentTypeFilter>
    <include name="foo*"/>
    <exclude name="bar*"/>
  </documentTypeFilter>
  -->
</siteconf>

Element

Required

Description

title

yes

a (typically short) title for the site

description

yes

a description for the site, shown on the sites overview page

skin

yes

the skin to use for this site

navigationDocId

yes

the ID of the navigation document

homepageDocId

one of these

the ID of the homepage

homepage

a path to the homepage, used instead of the homepageDocId. Usually this is a path to a Wiki extension (ext/something)

collectionId

one of these

the ID of the default collection for the site

collectionName

the name of the default collection of the site. Will only be used when collectionId is not set.

contextualizedTree

yes

true or false. Indicates whether the navigation tree should be shown in full (= when false), or if the navigation tree should only have open branches leading to the selected node (= when true)

navigationDepth

always displays the first n levels of the navigation tree. When using this with contextualizedTree=true then the first n levels will always be shown no matter what and more may be shown as you progress through the navigation.  When using contrextualizedTree=false then only the first n levels will be shown no matter at what place the current document happens to be in the navigation.

branch

no, default main

default branch for the site (specify either the branch ID or name)

language

no, default "default"

default language for the site (specify either the language ID or name)

defaultDocumentType

no

the default document type for this site. The document type can be specified either by ID or by name.

publisherRequestSet

no

which publisher request set to be used for the p:preparedDocuments publisher request for pages rendered in this site.

siteSwitching

no

defines if the browser should be redirected to another site if a document is better suited for display in another site. Valid values for the mode attribute are: stay (never switch to another site), all (consider all available sites as sites to switch to), selected (consider only selected sites, listed in <site> child elements inside the <siteSwitching> element). For more information see URL space management.

newVersionStateDefault

yes

publish or draft. This indicates the default state of the "Publish changes immediately" flag on the edit screen.

locking

the locking strategy to use. To use no locking at all, remove the <automatic> element (but leave the empty <locking> element). To use warn-locking, i.e. only warning that someone else is editing the page but still allowing concurrent edits, change the lockType attribute to "warn".

documentTypeFilter

no

allows to specify a filter for the document types that should be visible when creating a new document within this site. Zero or more include and/or exclude patterns can be specified, the order of the patterns is of no importance. Document types will be shown if they match at least one include pattern and no exclude pattern. If there are only exclude patterns, an implicit <include name="*"/> is assumed.

The patterns can be literal strings, or can contain the wildcards * and ?. The wildcard * matches zero or more characters. The wildcard ? matches exactly one character. To match one or more characters, you can use ?*. While document type names can't contain these characters, for completes we mention that the wildcards can be escaped using \* and \?, and backslash when used in this context can be escaped using \\ (thus \\* and \\?).

Note that this feature is not an access control, it forbids nothing, it just filters the document type list when shown.

5.1.2.2 Creating a site

Again, all you need to do to define a new site is creating a new subdirectory in the sites directory and putting a valid siteconf.xml in it.

5.1.2.3 Removing a site

To make a site unavailable, you can:

5.1.2.4 Runtime detection of new/updated/deleted siteconf's

Changes to the sites configurations are automatically picked up, it is not needed to restart the Daisy Wiki. It can take up to 10 seconds before Daisy notices your changes (this interval is configurable in the cocoon.xconf). If you don't see a site appearing, check the cocoon log files for errors.

5.1.2.5 Site filtering

The list of sites displayed to the user is filtered based on whether the user has access to the homepage document of the site. In case a custom homepage path is used (<homepage> instead of <homepageDocId>), you can still specify the homepageDocId to cause filtering. If this is not done, the site will always be displayed in the list.

5.1.3 Creating a new site using daisy-wiki-add-site

If you want to create a new site, including a new collection, a new navigation tree and a new homepage document, you can use the daisy-wiki-add-site program for this, which will automatically perform these steps for you and put a new siteconf.xml in the sites directory. To do this, open a command prompt, make sure DAISY_HOME is set, go to DAISY_HOME/install and execute:

daisy-wiki-add-site <location of wikidata directory>

5.1.4 Other site-features

5.1.4.1 skinconf.xml

Maintaining a custom skin can be more work then you'd like to put into it, therefore it is also possible to customise (or parameterise) existing skins. For example, for the default skin you can alter the logo in this way.

This is done by putting a file called skinconf.xml in the appropriate site directory. The contents of this file will be merged in the XML pipelines and hence be available to the XSL stylesheets. The required content and format of this file depends upon what the skin you use expects.

If a site-specific skinconf.xml is not provided, the system will use the skinconf.xml found in the root of the sites directory, if it exists.

For more information on skinconf.xml, see here.

5.1.4.2 Extension sitemaps

See Daisy Wiki Extensions.

5.2 Daisy Wiki Editor Usage Notes

5.2.1 Introduction

This document describes the editor used to modify pages stored in the document repository. The editor features wysiwyg editing.

5.2.1.1 Where do I find the editor?

The editor can be reached by either editing an existing document or creating a new document.

To edit an existing document, use the Edit link in the document action menu. This link is only visible if you are allowed to edit the document.

To create a new document, select the New Document link in the menu. You are then first presented with a list of document types, after selecting one the editor will open.

5.2.1.2 Document type influence

The content of the edit screen depends somewhat on the document type of the document you're editing or creating. See documents for a general discussion on documents and document types. As a quick reminder, a document can consists of multiple parts and fields. The parts contain the actual content, the fields are for more structured (meta)data.

If a part is marked as a "Daisy HTML" part, you will be presented with a wysiwyg editor for that part. Otherwise, a file upload control will be shown. Because of the ability to plugin in custom part editors, other types of editors might also appear (e.g. for editing book definitions).

5.2.1.3 Supported browsers

In theory, the document editing screen should work on most browsers. However, to use wysiwyg editing, it is advisable to use a recent version of one of the mainstream browsers, Mozilla/Firefox or Internet Explorer. We do most of our testing using recent versions of Firefox and Internet Explorer 6/7.

On other browsers, the editor will fall back to a textarea allowing you to edit the HTML source. On browsers that support wysiwyg editing, you can also switch to source editing.

In any case, Javascript (and cookies) must be enabled.

5.2.1.4 Heartbeat

While editing a page, the server keeps some state about your editing session. After a certain period of inactivity, the server will clean up the editing session. To avoid the editing session to expire while you're working on a document, a 'heartbeat' signal keeps your session alive. The heartbeat signal also serves to extend your lock on the document. (Technically speaking, the heartbeat signal is an Ajax-request).

5.2.1.5 Document locking

When you edit an existing document, the daisywiki will automatically take an exclusive lock on the document to ensure nobody else can edit the document while you're working on it. The duration of the initial lock is 15 minutes, the lock is then automatically extended if needed via the heartbeat signal.

If you start editing a page but decide you didn't want to after all, it is best to use the "Cancel editing" button at the bottom of the edit screen, so that your lock get cleared. If you don't do this, the lock will expire after at most 15 minutes, so this is not a big problem.

The locking behaviour can be adjusted by the site administrator. For example, the locking can be turned of completely. However, we expect that in most cases it will be left to the default behaviour described here.

5.2.1.6 Editing multiple documents at once

Editing multiple documents concurrently in different browser windows or tabs is supported.

5.2.2 Supported HTML subset and HTML cleaning

Although a wysiwyg editor is shown for the "Daisy HTML" parts, the goal is to limit the editing to a subset of HTML mainly focussing on structural aspects of HTML. So forget fonts, colors, special styling tricks, embedded javascript, and so on. Inserting those while editing in source view won't work either, as the HTML is cleaned up on the server side.

This cleanup process can also be triggered manually, by pressing the "Cleanup edited HTML" button. This can be useful if you pasted content copied from an external application and you want to see how it will look finally. When switching from wysiwyg to source view, the cleanup is also performed.

5.2.2.1 Supported HTML subset

These are the supported tags (or "elements") and attributes:

All tags not listed above will be removed (but their character content will remain). On the block-type elements and images, the id attribute is supported. For the most accurate list of elements and attributes, have a look at the htmlcleaner.xml file (see below).

The supported tags can have any content model as allowed by the HTML DTD, but of course limited to the supported tags. If an element occurs in a location where it is not supported, an ancestor is searched where it is allowed and the containing element(s) are ended, the element inserted, and the containing elements reopened. This happens for example when a <table> occurs inside a <p>.

<b> and <i> are translated to <strong> and <em> respectively, as are <span> tags with font-weight/font-style specifications.

If two or more <br> tags appear after one another, this is translated to a paragraph split. The meaningless <br>'s that the Mozilla editor tends to leave everywhere are removed. Text that appears directly in the <body> is wrapped inside <p> elements.

<br> tags inside <pre> are translated to newlines characters.

The result is serialized as a XML-well-formed HTML document (not XHTML) (UTF-8 encoded). Lines are split at 80 characters (if possible), meaningless whitespace is removed.

All this should also ensure that the resulting HTML is (mostly) the same whether it is edited using Mozilla or Internet Explorer.

The supported tags, attributes and classes for <p> are not hardcoded but can be configured in a file (htmlcleaner.xml). However, making arbitrary adjustments to this file is not supported (the html-cleaner code expects certain tags to be there). Adding new tags or attributes should generally not be a problem, but those won't have the necessary GUI editing support unless you implement that also.

5.2.3 Images

Images can be inserted either by browsing for an existing image in the repository, or by uploading a new image in the repository. You can also insert images that are not in the repository, but available at some URL.

You can change the alignment of the images (using the usual text-align buttons), and change how the text flows around the image. This last option won't have effect in the PDF output.

Note that images are also documents in the repository, thus are versioned and such. If you have an updated version of an image you want to insert, it is recommend to NOT delete the existing image and upload the new image, but rather go to the document editor for that image (you can use the "Open image in new window" toolbar button for this), and upload the new version over there.

Currently it is hardcoded that images should have an "ImageData" part. They can however be of any document type.

5.2.4 Links

The format of links to other documents in the daisy repository is:

daisy:<document id>
for example:
daisy:167

The daisy link can furthermore include branch, language and version specifications:

daisy:<document id>@<branch name or id>:<language name or id>:<version id>

Each of these additional parts is optional. For example to link to version 5 of document 167, on the same branch and language as the current document variant, use:

daisy:167@::5

The <version id> can be a number or the strings last or live.

If you don't know the id of a document by heart (which is likely the case), use the "Create link by searching" button on the toolbar.

A link can furthermore contain a fragment identifier. A fragment identifier is used to directly link to a specific element (e.g. a heading or a table) in a document. For this you first need to assign an ID to the element you want to link to (there is an "Edit ID" for this on the toolbar), and then you can adjust the link. The link editor dialogs make it easy by allowing to browse for available element IDs.

5.2.5 Upload and link ("attachment")

TODO

5.2.6 Includes

5.2.6.1 Including other Daisy documents

It is possible to include other documents into the document you are editing. This can be done in two ways:

To look up the actual document to include, use the toolbar button "Search for document to include". This will automatically insert an appropriate "daisy:" link.

After the "daisy:" link, you can put some whitespace (e.g. a a space character), and then put whatever additional text you want. This text will not be published, but is useful to leave a comment (e.g. the name of the included document).

By default the included document will look exactly as when it is displayed stand-alone. It is however possible to let the headings in the document shift (e.g. let a h1 become a h2 or h3) so that it better fits within the context. This heading shifting can be specified by opening the "Include settings" dialog using a toolbar button. In the HTML source, the include preference is stored in an attribute named daisy-shift-headings.

The editor will automatically show a preview of the included documents. The preview shows the content of Daisy-HTML parts and the fields, however without document type styling or heading-shifting applied. There are toolbar buttons to refresh or remove the previews. The include previews are not editable, when you try to edit them you will be asked to open the included document in a new window. Due to limitted control over the in-browser editors, you might find ways to edit the include previews anyway (e.g. using drag and drop), however these edits will be ignored.

When a document is published (= displayed), the includes are processed recursively. If an endless include-recursion is detected, an error notice will be shown at the location of the include.

5.2.6.2 Including content retrieved from arbitrary URLs

It is also possible to include other sources into your document, for example "http:" or "cocoon:" URLs (however, see Include Permissions). In that case, those URLs must produce an embeddable chunk of HTML in the form of well-formed XML. These includes are currently only supported in the HTML publishing, thus not for PDFs.

5.2.7 Embedded queries

It is possible to embed a query in a page. To do this, put your cursor on an empty line, and choose "Query" in the style dropdown. The style of the paragraph will change to indicate it will now be interpreted as a query. Then enter the query in the paragraph.

The query must be written in the Daisy Query Language. It is advisable to first try out your query via the "Query Search" page, and once it works and gives the expected results, to copy and paste it in the document.

If you save a document containing an invalid query, an error notice will be shown at the location of the query.

5.2.8 Query and Include

The "Query-Include" option allows to specify a query, and the documents returned by that query will be included (rather then showing the query results). This allows to quickly created an aggregated document without needing to manually insert includes.

It is not important what you put in the "select" part of the query, you can simply do "select id where ....".

In the same way as for includes, it is possible to specify that heading-shifting to be performed.

5.2.9 IDs and fragment identifiers

It is not only possible to link to a document, but also to a specific location in a document. The element to which you want to link (a header, image, ...) must have an ID assigned. To do this, place the cursor inside the element to which to assign the ID, and then press the "Edit ID" button on the toolbar.

Then to link to the specific element, just insert a link like you always do. Both the "Create link" and "Create link by searching" dialogs allow to select the ID from the target document (in the "fragment ID" field). In the HTML source, the target ID is specified in the link as in this example:

daisy:5#notes

This link will cause the browser to scroll to the element with an ID attribute with the value "notes". The part starting from the hash sign is called the "fragment identifier".

Explicit anchor elements (e.g. HTML <a name="notes"/>) are not supported, as these sort of elements are not visible in the wysiwyg editor and thus users would work blindly if these were used (deleting or moving them without being aware of it, and being impossible to edit in wysiwyg mode).

5.2.10 Editor shortcuts

Shortcut

Function

ctrl+b

bold

ctrl+i

italic

ctrl+z

undo

ctrl+y

redo

ctrl+c

copy

ctrl+x

cut

ctrl+v

paste

ctrl+1, ctrl+2, ...

switch to header level 1, 2, ...

ctrl+a

select all

ctrl+q

switch between bullets / no bullets

ctrl+r

remove formatting (same as the gum icon) (since Daisy 1.1)

5.2.11 Editing hints

5.2.11.1 Firefox and Mozilla

Pressing enter once in Firefox inserts a newline (a <br>). To start a new paragraph, press enter twice.

The toolbar buttons for cut/copy/paste won't work because of security restrictions, though you can configure Firefox to allow this for a specific site. More information is given when you click on one of these buttons while in Firefox. However, using the keyboard shortcuts you can perform these operations without any special configuration.

When you add a link or apply a styling to some words on the end of a line, it might be difficult (read: impossible) to 'move after' the link or styling. You can interrupt the link or styling by moving the cursor to the end of the line, and pressing the 'Remove link' or 'Remove formatting' button (thus without making a selection).

5.2.11.2 Internet Explorer (IE)

Merging table cells in IE works a bit counter-intuitive. You cannot simply select multiple cells and click on the merge cell button. Instead, put the cursor in one cell, and click on the merge cell button. You will then be asked how many rows and columns you want to merge.

5.2.11.3 All browsers

To copy content from one document to the other, it is recommended to open the source document in the editor and copy from there. This way you make sure you copy the original source content, e.g. with "daisy:" links intact.

5.2.11.4 Editing fields

5.2.11.4.1 Editing hierarchical field values

Hierarchical field values are entered as a slash-separated sequence. For example: abc / def / ghi. The whitespace around the separator slashes is not significant, it will be dropped. An extra slash at the start or end is allowed and will be ignored. Slashes separated by only whitespace will be dropped (i.o.w. they will not cause the addition of an element in the hierarchical path). If you want to use the slash character in a value itself, this can be done by escaping it as a double slash (//).

5.2.12 Character Set Information

By default, daisy is configured to use unicode (UTF-8) everywhere. For the part content you enter in the wysiwyg or source editor, you can use whatever unicode-supported characters (more correctly, it is limited by as far as Java supports unicode). Metadata however, such as the document name, fields, etc is stored in a relational database, MySQL, which needs to be configured with a certain encoding (in West-Europe often latin1) and hence is limited to the characters supported by that encoding. Contact your system administrator if you which to know what encoding that is, and thus to what characters (glyphs) you're limited.

5.3 Embedding multimedia and literal HTML

5.3.1 Introduction

Daisy includes some default document types for easily embedding multi media and literal HTML. There are no special tricks involved in their implementation, you could easily create them yourself, but they are included for convenience.

5.3.2 Embedding multi media

This explains how you can upload a flash animation, a movie or a sound fragment using the MultiMediaObject document type.

5.3.2.1 Usage

Create a new document, choose the document type MultiMediaObject, and upload the item. There are some fields available to control various options, like height, width and looping.

Then, in the document you want to embed the multimedia item, do an include of this multi media document. For this, use the "Insert new include" button on the toolbar (of the rich text editor), and enter the ID of the document to include (you can look it up with the "Search for document to include" button).

5.3.2.2 Implementation note

The MultiMediaObject document type is simply a regular document type with which a document type specific XSLT is associated which inserts the HTML <object> and <embed> tags.

5.3.3 Embedding literal HTML

The default "Daisy-HTML" parts only allow a small, structured subset of HTML. Sometimes you might want to enter whatever HTML you like, most often to create HTML-based multi media. Another common example is including content from third-party sites such as YouTube. In that case, you can use the "Literal HTML" document type.

5.3.3.1 Usage

Create a document of type "Literal HTML", and enter the HTML in the editor. The HTML should be well-formed XML and enclosed by <html> and <body> tags. If this is not the case, the editor will automatically clean the HTML up (there's a 'sponge' icon to trigger this cleanup).

Only the content of the <body> tag will remain when the document is published.

To embed the newly created literal HTML document into another document, use the normal document include functionality.

5.3.3.2 Publisher request note

If you are using custom publisher requests, be aware that you need to enable the inlining of the "LiteralHtmlData" part. You can add this to the default publisher request (usually called default.xml), as shown here:

...
  <p:prepareDocument inlineParts="LiteralHtmlData"/>
...

5.4 Navigation

5.4.1 Overview

Daisy allows to create hierarchical navigation trees for your site. Some of the features and possibilities:

The Daisy Wiki has an advanced GUI for editing the navigation trees, so that users are not confronted with the raw XML. It is of course possible to switch to a source view. Editing a navigation tree is done in the same way as any other document is edited.

It is possible to create readable URLs (i.e. URLs containing readable names instead of numbers) by basing the URL space on the navigation tree and assigning meaningful node IDs to nodes in the navigation tree. See the document about URL management.

The 'root' navigation document of a site is accessible through the [Edit navigation] link below the navigation tree, which is visible is you are logged on as a non-guest-role user. You can also get an overview of all navigation documents using this query:

select id, branch, language, name where documentType = 'Navigation'

5.4.2 Description of the navigation XML format

5.4.2.1 The empty navigation tree

The simplest possible navigation tree description is the empty one:

<d:navigationTree xmlns:d="http://outerx.org/daisy/1.0#navigationspec">
</d:navigationTree>

5.4.2.2 Document node

Adding document nodes to it is easy:

<d:navigationTree xmlns:d="http://outerx.org/daisy/1.0#navigationspec">
  <d:doc id="26"/>  
  <d:doc id="32">
    <d:doc id="15"/>  
  </d:doc>
</d:navigationTree>

As shown, the nodes can be nested.

By default , the navigation tree will display the name of the document as the label of a node. However, sometimes you might want to change that, for example if the name is too long. Also, when editing the navigation tree description as a source document, it will quickly become difficult to figure out what node stands for what. Therefore, you can add an attribute called "label" to the d:doc elements:

<d:navigationTree xmlns:d="http://outerx.org/daisy/1.0#navigationspec">
  <d:doc id="26" label="Introduction"/>  
  <d:doc id="32" label="Hot Stuff">
    <d:doc id="15" label="Fire"/>
  </d:doc>
</d:navigationTree>

By default the ID of a document node is the document ID, but you can assign a custom ID by specifying it in an attribute called nodeId. The custom ID should not start with a digit and not contain whitespace.

To link to a document on another branch or in another language, add a branch and/or language attribute on the d:doc element. The value of the attribute can be a branch/language name or ID. By default, documents are assumed to be on the same branch and in the same language as the navigation tree document itself.

5.4.2.2.1 Visibility

The d:doc node supports a visibility attribute. By default, nodes are visible. The visibility attribute allows to specify that a node should only become visible when it is active or should never be visible at all. Non-visible nodes are useful to have the navigation tree opened up to a certain point, without displaying a too-deeply nested hierarchy or a large amount of sibling nodes. In the Daisy Wiki, where the navigation tree controls the URL space, this can additionally be useful to control the URL path.

The syntax is:

<d:doc id="..." visibility="always|hidden|when-active"/>

5.4.2.3 Link node

To insert a link to an external location (a non-Daisy document), use the link element:

<d:navigationTree xmlns:d="http://outerx.org/daisy/1.0#navigationspec">
  <d:doc id="26" label="Introduction"/>  
  <d:doc id="32" label="Hot Stuff">
    <d:doc id="15" label="Fire"/>
  </d:doc>
  <d:link url="http://outerthought.org" label="Outerthought"/>
</d:navigationTree>

The attributes url and label are both required. The link element supports an optional id attribute.

5.4.2.3.1 Visibility

It is possible to hide link nodes depending on whether the user has read access to some other document (a guarding document). The syntax for specifying this document is:

<d:link url="http://www.daisycms.org" label="Daisy CMS"
        inheritAclDocId="78-DSY" inheritAclBranch="main" inheritAclLanguage="default"/>

As usual, specifying the branch and language is optional.

For the inheritAclDocId, one can use the special value "this" to refer to the Daisy document in which the navigation tree itself is stored. In this case, the value of the inheritAclBranch and inheritAclLanguage attributes, if present, is not used.

5.4.2.4 Group node

If you want to group a number of items below a common title, use the group element. The group element can optionally have an attribute called id to specify a custom id for the node (otherwise, the id is automatically generated, something like g1, g2, etc).

<d:navigationTree xmlns:d="http://outerx.org/daisy/1.0#navigationspec">
  <d:group label="Some title">
    <d:doc id="26" label="Introduction"/>  
    <d:doc id="32" label="Hot Stuff">
      <d:doc id="15" label="Fire"/>
    </d:doc>
  </d:group>
  <d:link url="http://outerthought.org" label="Outerthought"/>
</d:navigationTree>

The group node also supports the visibility attribute, see the document node for more information on this. 

5.4.2.5 Import node

To import another navigation tree, use the import element:

<d:navigationTree xmlns:d="http://outerx.org/daisy/1.0#navigationspec">
  <d:group label="Some title">
    <d:doc id="26" label="Introduction"/>  
    <d:doc id="32" label="Hot Stuff">
      <d:doc id="15" label="Fire"/>
    </d:doc>
    <d:import docId="81"/>
  </d:group>
  <d:link url="http://outerthought.org" label="Outerthought"/>
</d:navigationTree>

The docId attribute on the d:import element is of course the id of the navigation document to be imported.

5.4.2.6 Query node

It is possible to dynamically insert nodes by including a query, for example:

<d:navigationTree xmlns:d="http://outerx.org/daisy/1.0#navigationspec">
  <d:doc id="26" label="Introduction"/>  
  <d:doc id="32" label="Hot Stuff">
    <d:doc id="15" label="Fire"/>
    <d:query q="select name where $somefield='hot'"/>
  </d:doc>
</d:navigationTree>

The selected value, in this example "name", will be used as the node label.

Since the query is embedded in an XML file, don't forget that you might need to escape certain characters, e.g. < should be entered as &lt;

Queries embedded in a navigation tree are executed while building the internal model of the navigation tree. This internal model is cached and shared for all users, it is only updated when relevant changes happen in the repository. So if queries contain items that can change on each execution (such as the result of a CurrentDate() call), these will not work as expected.

5.4.2.6.1 Selecting multiple values

If you select multiple values in the query, then group nodes will be created for the additional selected values, for example:

select documentType, $Category, name where true

With this query, group nodes will be created per value of documentType, within that per value of $Category, and then finally document nodes with the name as label.

5.4.2.6.2 Selecting link values

When selecting a link value (as not-last value), a document node will be created instead of a group node.

5.4.2.6.3 Selecting multi-value and hierchical values

It is allowed to select multi-value and/or hierarchical values. Selecting a hierarchical value will cause the creation of a navigation hierarchy corresponding to the hierarchical path. For multi-value values, nodes will be created for each of the values.

5.4.2.6.4 Re-sorting nodes

Inside the d:query element, you can insert d:column elements specifiying options for each selected value (= each column in the query result set). The number of d:column elements is not required to be equal to the number of selected values (if there are more columns than selected values, the additional columns are ignored). The syntax is:

<d:query ...>
  <d:column sortOrder="none|ascending|descending" visibility="..."/>
  ... more d:column elements ...
</d:query>

Both the sortOrder and the visibility attributes are optional.

Note that it is also possible to use the "order by" clause of the query to influence the sort order, however with multi-value values or hierarchical values with paths of varying length, it might be needed to have the nodes sorted after tree building. Specifying both an order by clause and sortOrder's on the d:column attributes is not useful, it will only cause extra time to build up the navigation tree.

5.4.2.6.5 Nesting nodes inside a query node

It is possible to nest any sort of node inside a query node, including query nodes themselves. The nested nodes will be executed once for each result in the query, and inserted at the deepest node created by the result set row. If there are any multivalue fields among the selected values in the query, this will be done multiple times.

Attributes of nested nodes can refer to the current result row and its selected values using ${...} syntax. Available are: ${documentId}, ${branchId}, ${languageId}, and each selected value can be accessed using ${1}, ${2}, ...

When query nodes are nested inside query nodes, ${../1} syntax can be used to refer to values from higher-level query nodes.

If the expressions are inside a "q" attribute of a query node, the value will be automatically formatted appropriately for use in queries. For example, correct date formatting or escaping of string literals. Quotes will be added automatically as necessary, so you can just do something like $MyStringField = ${3}.

If the expression is used in the url attribute a link node, the value will be automatically URL-encoded (using UTF-8).

In all other cases, a simple "toString" of the value is inserted.

When an expressions refers to a non existing value (e.g. ${5} when there are less than 5 selected values), the expression will be left untouched (e.g. the output will contain ${5})

If a selected value is multivalue or hierarchical (or both) it is currently not made available for retrieval in expressions.

To insert ${ literally, use the escape syntax \${.

5.4.2.6.6 useSelectValues attribute

If you want to select some values in a query to make them accessible to child nodes, but don't want these values to be used for constructing nodes in the tree, the useSelectValues attribute of the d:query element can be used. The value of this attribute is a number specifying how many of the selected values should be used (counting from the left). This value may be null, in which case the current query node itself will not add any nodes to the navigation tree (this only makes sense if the query node has child nodes).

5.4.2.6.7 Default visibility

The query element can have a visibility attribute to specify the default visibility, for cases where the visibility is not specified for individual columns.

5.4.2.6.8 Filter variants

The query element can have an optional attribute called filterVariants with value true or false. If true, the query results will be automatically limited to the branch and language of the navigation document.

5.4.2.6.9 Example: query-import navigation trees

If you want to dynamically import navigation trees using a query, you can use the following construct:

<d:query q="select id where documentType = 'Navigation' order by name" useSelectValues="0">
  <d:import docId="${documentId} branch="${branchId}" language="${languageId}"/>
</d:query>
5.4.2.6.10 Example: generating link nodes
<d:query q="select name where true" useSelectValues="0">
  <d:link label="Search on google for ${1}"
          href="http://www.google.com/search?q=${1}"
</d:query>

5.4.2.7 Separator node

A separator node is simply a separating line between two nodes in the navigation tree. Its syntax is:

<d:separator/>

Multiple sibling separator nodes, or separator nodes appearing as first or last node within their parent, are automatically hidden.

5.4.2.8 Associating a navigation tree with collections

If you want to automatically limit the result of queries in the navigation tree to documents contained by one ore more collections, you can add a collections element as first child of the navigationTree element:

<d:navigationTree xmlns:d="http://outerx.org/daisy/1.0#navigationspec">
  <d:collections>
    <d:collection name="MyCollection"/>
  </d:collections>
  <d:doc id="26" label="Introduction"/>  
  <d:doc id="32" label="Hot Stuff">
    <d:doc id="15" label="Fire"/>
    <d:query q="select name where $somefield='hot' order by name"/>
  </d:doc>
</d:navigationTree>

5.4.2.9 Node nesting

The doc, group, query, separator, link and import nodes can be combined and nested as you desire, with the exception that separator and import can't have child elements.

Any other elements besides the ones mentioned here are prohibited, as is text in between the nodes.

5.4.3 Implementation notes

The Navigation Manager is implemented as an extension component running inside the repository server. It has its own HTTP+XML interface and remote Java API.

5.5 Faceted Browser

5.5.1 Introduction

The Daisy Wiki includes a faceted browser which allows for faceted navigation through the repository. The faceted browser shows the distinct values for selected properties (facets) of the documents in the repository, and allows to search for documents by selecting values for these facets. This technique is quite common in many websites, but Daisy's faceted browser makes it very easy to add it to your site.

A somewhat bland demo (i.e. only using system properties) of the faceted browser can be found on the main cocoondev.org site.

To use the faceted browser, you need to create a small configuration file in which you list the facets (document properties) to use. You can have multiple faceted navigation configurations.

5.5.2 Howto

Faceted navigations are defined on a per-site level. In the directory for the site, create a subdirectory called "facetednavdefs" if it does not already exist. Thus the location for this directory is:

<wikidata directory>/sites/<sitedir>/facetednavdefs

In this directory, create a file with the extension ".xml", for example "test.xml". The content of the file should be something like this:

<facetedNavigationDefinition xmlns="http://outerx.org/daisy/1.0#facetednavdef">
  <options>
    <limitToSiteCollection>false</limitToSiteCollection>
    <limitToSiteVariant>true</limitToSiteVariant>
    <additionalSelects>
      <expression>variantLastModified</expression>
      <expression>variantLastModifierLogin</expression>
    </additionalSelects>
    <defaultConditions>true</defaultConditions>
    <defaultOrder>documentType ASC, name ASC</defaultOrder>
  </options>
  <facets>
    <facet expression="documentType"/>
    <facet expression="collections"/>
    <facet expression="lastModifierLogin"/>
  </facets>
</facetedNavigationDefinition>

About the content of this file:

The options limitToSiteCollection and limitToSiteVariant speak pretty much for themselves, they define whether the query should automatically limit to documents belonging to the collection, branch and language of the current site. If you want to include the collection or branch/language as facets to search on, then you put the respective options to false, otherwise to true. In this example, since we included collections in the list of facets, we put the limitToSiteCollection option to false.

The <facet> elements list the different facets on which the user can browse. The expression attribute contains an identifier as used in the Daisy Query Language. Thus to include document fields, use $fieldname.

<additionalSelects> is an optional element which adds extra identifiers to the select clause of the query which is sent to the repository. Identifiers are the same ones found in the query language and are set as a list of expression elements. The 'name' and 'summary' identifiers are always the first two identifiers found in the query.

The <defaultConditions> element is optional, and contain a set of a set of query conditions to limit the set of documents on which the faceted navigation will be done, for example:

<defaultConditions>$someField = 'abc' and $someOtherField='def'</defaultConditions>

The <defaultOrder> element is also optional. This element is used to set the default order in which search results will be sorted. The syntax is the same as the order by clause of the query language.

The faceted navigation definition file is validated against an XML Schema, so don't put any additional elements in it or validation will fail.

Once you have saved this file, you can use the faceted browser immediately to browse on the defined facets (a restart of the Daisy Wiki is not needed). The faceted browser is accessed with an URL of this form:

http://localhost:8888/daisy/yoursite/facetedBrowser/test

In which you need to replace "yoursite" with the name of your site and "test" with the name of the file you just created, without the ".xml" extension.

5.5.3 Usage

5.5.3.1 Faceted browser initialisation

You can define different initialisations for the faceted browser in the faceted navigation definition file.  This can be done using an optionsList element.

<facetedNavigationDefinition xmlns="http://outerx.org/daisy/1.0#facetednavdef">
  <optionsList defaultOptions="standard">
    <options id="standard">
      <limitToSiteCollection>false</limitToSiteCollection>
      <limitToSiteVariant>true</limitToSiteVariant>
      <defaultConditions>true</defaultConditions>
    </options>
    <options id="doctype">
      <limitToSiteCollection>false</limitToSiteCollection>
      <limitToSiteVariant>true</limitToSiteVariant>
      <defaultConditions>documentType='{request-param:docType|SimpleDocument}'</defaultConditions>
      <defaultOrder>documentType ASC</defaultOrder>
 </options>
  </optionsList>
  <facets>
    <facet expression="documentType"/>
    <facet expression="collections"/>
    <facet expression="lastModifierLogin"/>
  </facets>
</facetedNavigationDefinition>

If you wish to be able to choose from a range of different options without having to make different definition files you can use optionsList element.  It contains a list of different options definitions which must be identified using the id attribute on the options element.  In order to know which set of options should be used by default you must set the defaultOptions attribute to the id of one of the options elements.

After having defined your optionsList you will probably want to specify one of the options there.  This can be done by adding a request parameter in the url.  It would look something like this :

http://localhost:8888/daisy/yoursite/facetedBrowser/test?options=doctype

In the definition you will also find this

{request-param:docType|SimpleDocument} --> {request-param:request-parameter-name|default-request-parameter-value}

If the specified request parameter exists the {...} will be substituted by the parameter value.  In case that no such parameter exists the default value will be used.  In the url our example will look a bit like this :

http://localhost:8888/daisy/yoursite/facetedBrowser/test?options=doctype&docType=SomeDocumentType

5.5.3.2 Showing the navigation tree

If you wish to have the navigation tree displayed in the faceted browser you can specify the navigation path as a request parameter (activeNavPath).  Here is an example :

http://localhost:8888/daisy/yoursite/facetedBrowser/test?activeNavPath=/path/to/facetedBrowser

The presence of the parameter will convey your wish to see the navigation tree and set the active navigation node to the specified path.

5.5.3.3 Using an alternative stylesheet

If you wish to use a different stylesheet for the faceted browser than the one found in <skin-dir>/xslt/faceted_browser.xsl, then you can specify this in faceted navigation definition file.  Your file might look something like this

<facetedNavigationDefinition xmlns="http://outerx.org/daisy/1.0#facetednavdef">
  <stylesheet src="daisyskin:facetednav-styling/myfaceted_browser.xsl"/>
  <options>
   ...

Lets follow the example above.  First create a directory in your skins directory with the name 'facetednav-styling'.  Create a file with the name 'myfaceted_browser.xls' in your freshly created directory.  The contents of the file could be something like this

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:d="http://outerx.org/daisy/1.0"
  xmlns:p="http://outerx.org/daisy/1.0#publisher"
  xmlns:n="http://outerx.org/daisy/1.0#navigation"
  xmlns:i18n="http://apache.org/cocoon/i18n/2.1"
  xmlns:daisyutil="xalan://org.outerj.daisy.frontend.util.XslUtil"
  xmlns:urlencoder="xalan://java.net.URLEncoder">  
<!-- Import the original stylesheet -->
  <xsl:import href="daisyskin:xslt/faceted_browser.xsl"/>
<!-- The customization -->
  <xsl:template name="content">
    <h1>My own faceted browser</h1>
    <div class="facetbrowser-resultcount"><xsl:value-of select="d:facetedQueryResult/d:searchResult/d:resultInfo/@size"/> document(s) found.</div>
    <br/>
    <xsl:call-template name="options"/>
    <br/>    
    <br/>
    <xsl:call-template name="results"/>
    <xsl:call-template name="javascript"/>
  </xsl:template>
</xsl:stylesheet>

In the simple example above the title on the faceted browser page was changed and the link to query page was removed.  See how the original faceted browser styling was used as a base stylesheet.  This stylesheet can be found here :

<daisy_home>/daisywiki/webapp/daisy/resources/skins/default/xslt/faceted_browser.xsl

Have a look in there to get an idea of what you can customize.

5.5.3.4 Defining discrete facets

If you fear that your facet has too many values to be displayed in an fashionable manner you can define discrete facets.  These facets will place values in a series of ranges.  There are 3 types of discrete facets :

When you use the type attribute you tell Daisy that values from this facet can be discrete.
Here is an example :

<facets>  
<!-- A date discrete facet with a parabolic spread. -->
  <facet expression="$SomeDate" type="DATE" threshold="7" spread="2.0">
    <properties>
      <property name="threshold" value="7"/>
      <property name="spread" value="2.0"/>
    </properties>
  </facet>
 <!-- Number discrete facet with a linear spread -->
  <facet expression="$SomeNumber" type="NUMBER">
    <properties>
      <property name="threshold" value="7"/>
    </properties>
  </facet>
  <facet expression="$SomeString" type="STRING">
    <properties>
      <property name="threshold" value="7"/>
    </properties>
  </facet>
</facets>

5.5.4 Futher pointers

Faceted Classification Discussion mailing list.

5.6 URL space management in the Daisy Wiki

5.6.1 Overview

The URL space of the documents when published through a Daisy Wiki site is related to the hierarchical navigation tree of the site. With URL space we mean the URLs that get assigned to documents, or the other way around, what URL you need to enter in the location bar of your web browser to get a certain document. This document goes into some details about how all this works.

URL stands for Uniform Resource Locator. It is how resources (documents etc) on the world wide web are addressed, in other words the widely-known http://host.com/some/path things.

5.6.2 The (non-)relation between the Daisy repository and the URL space

The Daisy repository is a flat (non-hierarchical) document-store. It isn't necessarily web-related, and hence doesn't define, dictate or influence how documents are actually published on the web, including how these documents will map to the URL space.

Recall that documents in the repository are identified by a unique, numeric ID. The uniqueness is within one repository, it is a sequence number starting at 1, not a global unique identifier.

5.6.3 URL mapping

The URL mapping in the Daisy Wiki is based on the hierarchical navigation tree. This means that when a document is requested, the navigation tree is consulted to resolve the path. The other way around, when publishing documents, the logical daisy:<document-id> links that occur in documents that are stored in the repository are translated to the path at which they occur in the navigation tree (if a document occurs at multiple locations, the first occurrence -- in a depth-first traversal -- is used).

5.6.3.1 Relation between the navigation tree and the URL space

Each nested node in the navigation tree becomes a part of the path in an URL. The name of the part of the path is the ID of the navigation tree node. What this ID is depends on the type of node:

If you want to have more readable URLs, it is recommended to assign node IDs in the navigation tree. With readable URLs we mean URLs containing meaningful words instead of automatically assigned numbers.

5.6.3.2 Importance of readable URLs?

It is in no way required to assign custom node IDs in the navigation tree. You only need to do this if you want to have readable, meaningful URLs.

Some advantages of having readable URLs is:

However, URLs containing the raw document IDs also have their advantages:

It is a good idea to standardise on some conventions when naming navigation tree nodes. For example, use always lower case and separate names consisting of multiple parts with dashes.

If all you want to have are some shortcut URLs for certain documents, independent of where they occur in the navigation tree, you can run Apache in front of the Daisy Wiki and configure redirects over there.

5.6.3.3 How URL paths are resolved in the Daisy Wiki

When a request for a certain path comes in, the Daisy Wiki will ask the navigation tree manager to lookup that path in the navigation tree for the current site. There are a number of possible outcomes:

5.6.3.3.1 Site-search algorithm

The site search algorithm is used each time when a document might be more suited for display in the context of another site, thus when the document has not been found in the current site's navigation tree.

The sites that will be considered in the search can be configured in the siteconf.xml using the <siteSwitching> element, whose syntax is as follows:

<siteSwitching mode="stay|all|selected">
  <site>...</site>
  ... more <site> elements ...
</siteSwitching>

The mode attribute takes one of these values:

The site-search algorithm works as follows:

5.6.3.4 Not all documents must appear in the navigation tree

As a consequence of the above described resolving mechanism, any document can be accessed in the repository even if it does not occur in the navigation tree. Simply use an URL like:

http://host/daisy/mysite/<document-id>

In which <document-id> is the ID of the document you want to retrieve.

After each document URL you can add the extension .html, thus the above could also have been:

http://host/daisy/mysite/<document-id>.html

By default, the Daisy Wiki generates links with a .html extension, since this makes it easier to download a static copy of the site to the file system (otherwise you could have files and directories with the same name, which isn't possible).

5.7 Document publishing

5.7.1 Document styling

5.7.1.1 Introduction

The Daisy Wiki allows to customize the styling of documents by mean of an XSLT. This custom styling is typically performed depending on the document type.

5.7.1.2 The Input XML

The input of the stylesheets is an XML document which has a structure as shown below. This is not an extensive schema containing every other element and attribute, but those that you'll need most often.

<document
    isIncluded="true|false"
    displayContext="standalone|something else"
    xmlns:d="http://outerx.org/daisy/1.0"
    xmlns:p="http://outerx.org/daisy/1.0#publisher">

  <context .../>
  <p:publisherResponse>
    <d:document xmlns:d="http://outerx.org/daisy/1.0"
        id="..."
        name="..."
        [... various other attributes ...] >

      <d:fields>
        <d:field typeId="..." name="..." label="..." valueFormatted="..."
                 [... other attributes and children ...]>
        ... more fields ...
      </d:fields>

      <d:parts>
        <d:part typeId="..." mimeType="..." size="..." label="..." daisyHtml="true/false">
          [... HTML content of the part including html/body if @daisyHtml=true ...]
        </d:part>
        ... more parts ...
      </d:parts>

      <d:links>
        <d:link title="..." target="..."/>
        ... more links ...
      </d:links>

      [... customFields, lockInfo, collectionIds ...]
    </d:document>
  </p:publisherResponse>
</document>

The isIncluded attribute on the document element indicates if this document is being published as top-level document or for inclusion inside another document. Sometimes you might want to style the document a bit different when included.

Similarly, the displayContext attribute on the document element gives a hint toward the context in which a document is being displayed. The value "standalone" is used for cases where the document is displayed by itself, rather than as part of an aggregation. The value of the displayContext comes from the p:preparedDocument instruction in the publisher request.

The context element is the same as in the layout.xsl input. It provides access to various Wiki-context and user information. Before Daisy 1.5, the context element was not available, only a user element. The user element is still included for compatibility (not shown here) but will eventually be removed.

The p:publisherResponse element then contains the actual document (and possibly related information) to be published. It is the result of the p:preparedDocuments publisher request. It will always contain the basic d:document element but can contain additional information if a custom publisher request is used.

5.7.1.3 Expected stylesheet output

The output of the XSLT should be an embeddable chunk of HTML (or XSL-FO in the case of PDF). Thus no <html> and <body> elements, but something which can be inserted inside <body> (or inside a <div>, a <td >, etc). Where the produced output will end up depends on the stylesheet creating the general page layout, or in the case of included documents, the location of the inclusion.

5.7.1.4 Where the stylesheets should be put

The stylesheets should be placed in the following directory:

<wikidata directory>/resources/skins/<skin-name>/document-styling/<format>

In which <skin-name> is the name of the skin you're using (by default: "default"), and <format> either html or xslfo. Thus for the default skin, for HTML, this becomes:

<wikidata directory>/resources/skins/default/document-styling/html

The stylesheet should be named (case sensitive):

<document-type-name>.xsl

5.7.1.5 Example 1: styling fields in a custom way

Suppose we have a document type called "TestDocType" with a " SimpleDocumentContent" part , and two fields called "field1" and "field2". The default layout will first place the parts, then the fields (in a table), and then the out-of-line links (if any).

The stylesheet below shows how to put the fields at the top of the document:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:d="http://outerx.org/daisy/1.0">

  <xsl:import href="daisyskin:xslt/document-to-html.xsl"/>

  <xsl:template match="d:document">
    <h1 class="daisy-document-name"><xsl:value-of select="@name"/></h1>

    <p>
      Hi there! Here's the value of field1:
      <xsl:value-of select="d:fields/d:field[@name='field1']/@valueFormatted"/>
      and field 2:
      <xsl:value-of select="d:fields/d:field[@name='field2']/@valueFormatted"/>
    </p>

    <xsl:apply-templates select="d:parts/d:part"/>
    <xsl:apply-templates select="d:links"/>
    <!-- xsl:apply-templates select="d:fields"/ -->
  </xsl:template>

</xsl:stylesheet>

To minize our efforts, we import the default stylesheet and only redefine what is needed. For comparison, the default template for d:document looks as follows:

<xsl:template match="d:document">
  <h1 class="daisy-document-name"><xsl:value-of select="@name"/></h1>
  <xsl:apply-templates select="d:parts/d:part"/>
  <xsl:apply-templates select="d:links"/>
  <xsl:apply-templates select="d:fields"/>
</xsl:template>

This new stylesheet should be saved as:

<wikidata directory>/resources/skins/default/document-styling/html/TestDocType.xsl

Now surf to a document based on TestDocType, and you should see the result.

5.7.1.6 Example 2: styling parts in a custom way

In this example, suppose we have a document type called "Article" with parts "Abstract" and "Body". We would like to render the abstract in a box. The below stylesheet shows how this can be done.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:d="http://outerx.org/daisy/1.0">

  <xsl:import href="daisyskin:xslt/document-to-html.xsl"/>

  <xsl:template match="d:document">
    <h1 class="daisy-document-name"><xsl:value-of select="@name"/></h1>

    <div style="margin: 20px; padding: 10px; border: 1px solid black; background-color: #ffd76c">
      <xsl:apply-templates select="d:parts/d:part[@name='Abstract']"/>
    </div>
    <xsl:apply-templates select="d:parts/d:part[@name='Body']"/>

    <xsl:apply-templates select="d:links"/>
    <xsl:apply-templates select="d:fields"/>
  </xsl:template>

</xsl:stylesheet>

5.7.2 Document information aggregation

5.7.2.1 Introduction

When a document is published in the Wiki, the Wiki will retrieve the document using a publisher request, and style it using a document-type specific stylesheet (or fall back to a default stylesheet).

The information available to this stylesheet is basically just the document with its parts and fields (and some wiki-context information such as the user etc.). Sometimes it might be useful to show additional information with the document.

For example, suppose you have a field "Category". When a document is published, you would like to show at the bottom of the document a list of documents which have the same value for the Category field as the document that is being published.

This can be done by making use of custom publisher request for these documents. The basic reference information on this can be found in the publisher documentation. Here we will have a look at how this applies to the Wiki using a practical example.

If you want to try out the example described here, then make a field Category (string), create a document type having this field (its name does not matter), and create a few documents of this document type, of which at least some share the same value for the Category field.

For none of the changes described here, it is required to restart the repository server or wiki.

5.7.2.2 Creating a publisher request set

The first thing to do is to define a new publisher request set. In the daisydata directory (not wikidata directory), you will find a subdirectory called "pubreqs":

<daisydata directory>/pubreqs

In this directory, create a new subdirectory, for the purpose of this example we will call it "foobar":

<daisydata directory>/pubreqs/foobar

In this directory, we need to create three files:

Let's start with the mapping file. Create, in the foobar directory, a file named mapping.xml, with the following content:

<?xml version="1.0"?>
<m:publisherMapping xmlns:m="http://outerx.org/daisy/1.0#publishermapping">
  <m:when test="$Category is not null" use="categorized.xml"/>
  <m:when test="true" use="default.xml"/>
</m:publisherMapping>

This mapping tells that when the document has a field Category, the publisher request in the file categorized.xml should be used. In all other cases, the second m:when will match and the publisher request in the file default.xml will be used. The expressions here are the same as used in the query language (and thus as in the ACL).

Instead of checking on the existence of a field, a more common case is to check on the document type. For this you would use an expression like documentType = 'MyDocType'.

Create (in the same foobar directory) a file called default.xml with the following content:

<?xml version="1.0"?>
<p:publisherRequest xmlns:p="http://outerx.org/daisy/1.0#publisher">
  <p:prepareDocument/>
  <p:aclInfo/>
  <p:subscriptionInfo/>
</p:publisherRequest>

This is the same publisher request as is normally used, when you do not bother to create custom publisher requests.

Now we arrive at the most interesting: the publisher request for documents having a Category field. Create a file called categorized.xml with the following content:

<?xml version="1.0"?>
<p:publisherRequest xmlns:p="http://outerx.org/daisy/1.0#publisher" styleHint="categorized.xsl">
  <p:prepareDocument/>
  <p:aclInfo/>
  <p:subscriptionInfo/>

  <p:group id="related">
    <p:performQuery>
      <p:query>select name where $Category = ContextDoc($Category) and id != ContextDoc(id)</p:query>
    </p:performQuery>
  </p:group>

</p:publisherRequest>

Note the difference with the publisher request in default.xml. We have now included a query which retrieves the documents with the same value for the Category field, but excluding the current document. The only purpose of the p:group element is to make it possible to distinguish this query from other queries we might add in the future.

Also note the styleHint attribute. This is an optional attribute that can be used to indicate the stylesheet to be used (instead of relying on the document-type specific styling).

5.7.2.3 Telling the Wiki to use the new publisher request set

In the Wiki, the publisher request set to be used can be specified per site (in the siteconf.xml) or can be changed for all sites (in the global siteconf.xml). Either way, this is done by adding a <publisherRequestSet> element in the siteconf.xml. For this example, we will change it globally, thus we edit:

<wikidata directory>/sites/siteconf.xml

And as child of the root element (the order between the elements does not matter), we add:

<publisherRequestSet>foobar</publisherRequestSet>

It can take a few seconds before the Wiki notices this change, but you do not need to restart the Wiki for this. If you would now go looking at documents with a Category field, they would still look the same as before, as we have not yet adjusted the stylesheets to display the new information.

5.7.2.4 Creating a stylesheet

The styling is just the same as with regular document-type specific styling, however the styling here is not specific to the document type but rather driven by the publisher request. In the publisher request we used the styleHint attribute to tell the Wiki it should use the categorized.xsl stylesheet. Other than that, everything is the same as for document-type specific styling. Thus we need to create a file categorized.xsl at the following location:

<wikidata directory>/resources/skins/default/document-styling/html/categorized.xsl

Since these stylesheets are in the same location as the document type specific stylesheets, care should be taken that their names do not conflict with document types (unless on purpose of course). If your custom publisher requests are related to the document type, it is not needed to specify the styleHint attribute as the normal document type specific styling will do its job.

Here is an example categorized.xsl:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:d="http://outerx.org/daisy/1.0"
  xmlns:p="http://outerx.org/daisy/1.0#publisher"> 
  
  <xsl:import href="daisyskin:xslt/document-to-html.xsl"/>

  <xsl:template match="d:document">
    <h1 class="daisy-document-name"><xsl:value-of select="@name"/></h1>
    <xsl:apply-templates select="d:parts/d:part"/>
    <xsl:apply-templates select="d:links"/>
    <xsl:apply-templates select="d:fields"/>

    <hr/>
    Other documents in this category:
    <ul>
      <xsl:for-each select="../p:group[@id='related']/d:searchResult/d:rows/d:row">
        <li>
          <a href="{$documentBasePath}{@documentId}?branch={@branchId}&amp;language={@languageId}">
            <xsl:value-of select="d:value[1]"/>
          </a>
        </li>
      </xsl:for-each>
    </ul>

    <xsl:call-template name="insertFootnotes">
      <xsl:with-param name="root" select="."/>
    </xsl:call-template>
  </xsl:template>
  
</xsl:stylesheet>

And that's it.

5.7.3 Link transformation

This section is about the transformation of links in the Daisy Wiki from "daisy:" to the public URL space.

5.7.3.1 Format of the links

It might be good to review the format of the links first. The structure of a Daisy link is:

daisy:docid@branch:language:version#fragmentid

The branch, language and version are all optional. Branch and language can be specified either by name or ID. The version is typically a version ID, or the string "live" (default) or "last" (to link to the last version). The fragment identifier is of course also optional.

A fragment identifier is used to point to a specific element in the target document.

If the branch and language are not mentioned, they are defaulted to be the branch and language of the document containing the link (and thus not to the default branch and language of the Daisy Wiki site).

A link can also consist of only a fragment identifier, to link to an element within the current document:

#fragmentid

The document addressed by a link is called the target document.

5.7.3.2 When and what links are transformed

The link transformation happens after the document styling XSLT has been applied.

The transformation applies to links in the following places:

5.7.3.3 Input for the document styling XSLT

Since the link transformation happens after the document styling, the document styling XSLT can influence the linking process (see "Linking directly to parts" below). For this, it is useful to know that the publisher leaves some information on links about the target document they link to. An annotated link looks like this:

<a href="daisy:123" p:navigationPath="/info/123">
  <p:linkInfo documentName="..." documentType="...">
    <p:linkPartInfo id="..." name="..." fileName="..."/>
  </p:linkInfo>
  see this
</a>

The whitespace and indenting is added here for readability, in reality no new whitespace is introduced (since the whitespace inside inline elements is significant).

The p:navigationPath attribute gives the path in the navigation tree where the document occurs. This attribute is only added if such a path exists, and if the navigation tree is known (i.e. if it is specified to the publisher, which in the Daisy Wiki is always the case). You should usually leave the p:navigationPath attribute alone, the link transformation process will use it to make the link directly point to the 'good' navigation location (afterwards, it will remove the p:navigationPath attribute).

The p:linkInfo element is only added if the target document exists and is accessible (i.e. the user has read permissions on it). It specifies the name of the target document and the name of its document type. Also, for each part in the document, a p:linkPartInfo element is added. Its id and name attribute specify the part type ID and part type name of the part. The fileName attribute is only added if the part has a file name.

It is the responsibility of the document styling XSLT to remove the p:linkInfo element. This is very simple with an empty template that matches on this element (as is the case in the default document-to-html.xsl).

5.7.3.4 Linking directly to parts

By default, the links will be transformed to links that point to the target document. This seems obvious, but sometimes it is desirable to link directly to the data of a part of the target document, for example for images. The link transformation process can be instructed to do so by leaving special attributes on the link element (<a> or <img>) in its namespace:

http://outerx.org/daisy/1.0#linktransformer

which is typically associated with the prefix lt.

The attributes are:

Examples of how to put this to use can be found in the document-to-html.xsl, more specifically look at how images and attachments are handled there.

5.7.3.5 Branch and language handling

If the branch and language differ from those specified in the site configuration, branch and language request parameters will be added.

5.7.3.6 Fragment ID handling

Fragment identifiers are prefixed with a string identifying the target document. This is needed because it is possible to publish multiple Daisy documents in one HTML page (e.g. using document includes), and the same element ID might be used in multiple documents, giving conflicts.

The format of the prefix is dsy<docid>_, an example prefixed fragment identifier looks like this:

#dsy123_hello

in which "123" is the ID of the target document, and "hello" the original fragment identifier target.

To make this work, the actual element IDs in Daisy documents are also prefixed with dsy<docid>_.

5.7.3.7 Disabling the link transformer

If you want the link transformer to leave a certain link alone, add an attribute lt:ignore="true" on the link element.

5.7.4 Document publishing internals

Here we will eventually add a complete description of how the process of getting a document published in the Daisy Wiki works behind the curtains.

For now, please see the Cocoon GT presentation on this subject.

5.8 Daisy Wiki Skinning

Customising the look and feel of the Daisy Wiki is possible through:

Daisy ships with one skin called default.

The skin to use is configurable on the level of a site in the siteconf.xml file. Thus different sites can use different skins.

Pages not belonging to a particular site (such as the login screen, the sites index page, etc.) use a globally configured skin, defined in the global siteconf.xml file:

<wikidata directory>/sites/siteconf.xml

If this file does not exist, the skin called default will be used. The content of the global siteconf.xml file should be like this:

<siteconf xmlns="http://outerx.org/daisy/1.0#siteconf">
  <skin>default</skin>
</siteconf>

5.8.1 skinconf.xml

5.8.1.1 Introduction

The skinconf.xml file is simply an XML file which is merged in the general XML stream and is available to the XSLT stylesheets, most specifically the layout.xsl. This allows to pass configuration information to XSLT stylesheets. The actual supported configuration will be dependent on the XSLT, and thus on the skin. The supported configuration for the default skin is given below.

A skinconf.xml file can be put in the site directory, or in the global sites directory:

<wikidata directory>/sites/skinconf.xml

If a site doesn't have its own skinconf.xml file, it will fall back to using the global one.

5.8.1.2 default skin skinconf.xml

<skinconf>
  <logo>resources/local/mylogo.png</logo>
  <daisy-home-link>Daisy Home</daisy-home-link>
  <site-home-link>Site Home</site-home-link>
</skinconf>

Each of the parameters (= XML elements) is optional.

The parameters quite speak for themselves:

It can take up to 10 seconds before changes made to a skinconf.xml file are noticed.

5.8.2 Creating a skin

5.8.2.1 The anatomy of a skin

A skin consists of a set of files: CSS file(s), images, XSLT stylesheets, and possibly others which are grouped below one directory. The directory containing the skins is located at:

<wikidata directory>/resources/skins

The name of the skin is the name of the directory below the skins directory. On a blank Daisy install, this skins directory will be empty. The default skin can be found in:

<DAISY_HOME>/daisywiki/webapp/daisy/resources/skins/default

5.8.2.2 Creation of a dummy skin

Daisy has a fallback mechanism between skins, which means that a new skin can be created based on an existing skin. This makes the initial effort of creating a skin very small.

As an example, suppose you want to create a skin called coolskin. The minimal steps to do this are:

  1. Create a directory for the skin:
    <wikidata directory>/resources/skins/coolskin
  2. In this newly created directory, put a file called baseskin.txt containing just one line like this:
    default

    (this should be the very first line in that file)

    This specifies that the new skin will be based on the default skin. This means that any file which is not available in the coolskin skin, will instead be taken from the default skin. This allows a skin to contain only copies of those files that it wants to change.

    Although there is no directory called default in the skins directory, the system will transparently fall back to the skins directory in the Daisy Wiki webapp (mentioned above).

  3. Modify one or more siteconf.xml files to use the new skin. For the non-site specific pages (login screen, index page, ...) this is:
    <wikidata directory>/sites/siteconf.xml

    Or for a specific site, the siteconf.xml file in the directory of that site.

Basically, you now have created a new skin, although it doesn't do anything yet. If you hit refresh in your browser, you will still see the same.

5.8.2.3 Customising the new skin

Now you can start customising the skin by copying files from the default skin and adjusting them. The two most important files, which allow to change most of the global look of the Daisy Wiki, are these:

  1. the <skindir>/css/layout.css file
  2. the <skindir>/xslt/layout.xsl file

If you only want to do smaller changes like changing some colours and fonts, you should get around by only copying the docstyle.css file to your new skin and adjusting it. (note: to change the logo, you can use the skinconf.xml mechanism)

The layout.xsl file builds the global layout of a page, thus how everything 'around' the main content shoud look. The input format of the XML that goes in the layout.xsl can be found here.

5.8.3 layout.xsl input XML specification

This is the layout.xsl input contract.

<page>
  <!-- The context element is usually produced by the PageContext class
       (but the layout.xsl doesn't care about this of course) -->
  <context>
    <!-- Information about the Daisy Wiki version -->
    <versionInfo version="..." buildHostName="..." buildDateTime="..."/>

    <!-- The mountPoint is everything of the URI path that comes before
         the part matched by the Daisy sitemap. By default, this is /daisy -->
    <mountPoint>...</mountPoint>

    <!-- The current 'version mode' -->
    <versionMode>live|last</versionMode>

    <!-- The site element specifies some information about the current Daisy Wiki site,
         this is of course only required when working in the context of a site. -->
    <site
      name="..."
      title="..."
      description="..."
      navigationDocId="..."
      collectionId="..."
      collection="..."
      branchId="..."
      branch="..."
      languageId="..."
      language="..."/>

    <!-- skinconf: the contents of the skinconf.xml file of the current site,
         or of the global skinconf.xml file in case the current page does is
         outside the context of a site. -->
    <skinconf/>

    <!-- user: information about the current user -->
    <user>
      <name>...</name>
      <login>...</login>
      <id>...</id>
      <activeRoles>
        <role id="..." name="..."/>
        (... multiple role elements ...)
      </activeRoles>
      <updateableByUser>true|false</updateableByUser>
      <availableRoles default="name of default role">
        <role id="..." name="..."/>
      </availableRoles>
    </user>

    <!-- layoutType: the type of layout that the layout.xsl must render.
         Tree possibilities:
            - default: the normal layout, possible with navigation tree,
                       page navigation links, links to other variants, etc.
            - mini: minimalistic layout, which shouldn't put the page content
                    inside a table (this layout is used by the editor screen,
                    and the HTMLArea in IE doesn't work when put inside a table.
            - plain: a layout that doesn't display anything beside the content.
    -->
    <layoutType>default|mini|plain</layoutType>

    <!-- request: some info about the request:
             - uri: the full request URI, including query string
             - method: GET, ...
             - server: scheme + host + port number if not 80
                       to which the HTTP request has been sent
    -->
    <request uri="..." method="..." server="..."/>

    <!-- skin: name of the current skin. Can be useful to use in paths to
         resources (images, css, js, ...) -->
    <skin>...</skin>
  </context>
  
  <!-- pageTitle: a title for the page, this is what comes inside the html/head/title
       element and thus in the title bar of the users' browser. This element may
       contain mixed content (e.g. i18n tags) so its content must be copied entirely,
       not just the string value. -->
  <pageTitle>...</pageTitle>

  <!-- layout hints (optional element):
        wideLayout: if there is no navigation tree and you want to make use of
                    the maximum available width, specify true.
        needsDojo: if you want dojo to be loaded on a non-cforms page, add this
                   attribute.
  -->
  <layoutHints wideLayout="true" needsDojo="true"/>

  <!-- A hierarchical navigation tree as produced by the navigation manager. Optional. -->
  <n:navigationTree/>

  <!-- ACL evluation information of the navigation tree. Optional, if presen