Friday, July 17, 2009

Managing Maven repository with Artifactory

It is close to one year, since we started using Maven as build tool for new projects. Overall experience with Maven has been very good. We are managing a simple disk based directory structure, which is served by Apache as local Maven repository. All artifacts in the local repository are populated manually by copying from a local machine. As more and more projects started using Maven, managing repository became a headache.

Need of a repository manager was felt to minimize the overhead of manually managing local repository. This wiki article has a good comparison of Archiva, Artifactory and Nexus. After a glance at comparison the Artifactory and Nexus seems to have more less similar feature set. The out of the box LDAP integration in Artifactory (Nexus has this feature in paid version) is a big differentiator, this feature was the main reason of going with Artifactory instead of Nexus.

Artifactory keeps its data in a database using Jackrabbit, a Java Content Repository (JSR 170) implementation and Jackrabbit supports almost all popular databases see the list here (these are the DDL scripts for all supported databases and executing when Artifactory is started first time). By default Artifactory uses Derby and other than that they have documented just about using MySQL. Personally I always prefer PostgreSQL over MySQL, so decided to go with PostgreSQL along with WAR based deployment in Tomcat.

Artifactory works out of the box just by dropping the war file in Tomcat and on first run it creates a Derby database in ${user.home}/.artifactory/data directory. To change the database to PostgreSQL or something else stop the Tomcat and follow the steps given below-

1. Delete ${user.home}/.artifactory/data directory.

2. Uncomment and change following line in ${user.home}/.artifactory/etc/artifactory.system.properties file-
    artifactory.jcr.configPath=repo/postgresql

3. Create ${user.home}/.artifactory/etc/repo/postgresql/repo.xml with following contents (this is based on MySQL configuration file available in Artifactory code base)-

<?xml version="1.0" encoding="ISO-8859-1"?>
<!--<!DOCTYPE Repository SYSTEM "config.dtd">-->

<Repository>
    <!--
        virtual file system where the repository stores global state
        (e.g. registered namespaces, custom node types, etc.)
    -->

    <!-- PostgreSQL Filesystem -->
    <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
        <param name="driver" value="org.postgresql.Driver"/>
        <param name="url" value="jdbc:postgresql://localhost:5432/artifactory"/>
        <param name="user" value="artifactory_user"/>
        <param name="password" value="password"/>
        <param name="schema" value="postgresql"/>
    </FileSystem>

    <!-- http://wiki.apache.org/jackrabbit/DataStore -->

    <!-- PostgreSQL Datastore -->
    <DataStore class="org.artifactory.jcr.jackrabbit.ArtifactoryDbDataStoreImpl">
        <param name="url" value="jdbc:postgresql://localhost:5432/artifactory"/>
        <param name="tablePrefix" value=""/>
        <param name="user" value="artifactory_user"/>
        <param name="password" value="password"/>
        <param name="databaseType" value="postgresql"/>
        <param name="driver" value="org.postgresql.Driver"/>
        <param name="minRecordLength" value="512"/>
        <param name="maxConnections" value="15"/>
        <param name="copyWhenReading" value="true"/>
    </DataStore>

    <!--
        security configuration
    -->
    <Security appName="Jackrabbit">
        <!--
            access manager:
            class: FQN of class implementing the AccessManager interface
        -->
        <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager">
            <!-- <param name="config" value="${rep.home}/access.xml"/> -->
        </AccessManager>

        <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
            <!-- anonymous user name ('anonymous' is the default value) -->
            <param name="anonymousId" value="anonymous"/>
            <!--
              default user name to be used instead of the anonymous user
              when no login credentials are provided (unset by default)
           -->
            <param name="defaultUserId" value="superuser"/>
        </LoginModule>
    </Security>

    <!--
        location of workspaces root directory and name of default workspace
    -->
    <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default"/>
    <!--
        workspace configuration template:
        used to create the initial workspace if there's no workspace yet
    -->
    <Workspace name="${wsp.name}">
        <!--
            virtual file system of the workspace:
            class: FQN of class implementing the FileSystem interface
        -->
        <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${wsp.home}"/>
        </FileSystem>
        <!--
            persistence manager of the workspace:
            class: FQN of class implementing the PersistenceManager interface
        -->

        <!-- PostgreSQL Persistance Manager -->
        <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.PostgreSQLPersistenceManager">
            <param name="url" value="jdbc:postgresql://localhost:5432/artifactory"/>
            <param name="user" value="artifactory_user"/>
            <param name="password" value="password"/>
            <param name="schemaObjectPrefix" value="${wsp.name}_"/>
        </PersistenceManager>

        <!-- http://issues.apache.org/jira/browse/JCR-314 -->
        <ISMLocking class="org.apache.jackrabbit.core.state.FineGrainedISMLocking"/>

        <!--
            Search index and the file system it uses.
            class: FQN of class implementing the QueryHandler interface

            If required by the QueryHandler implementation, one may configure
            a FileSystem that the handler may use.

            Supported parameters for lucene search index:
            - path: location of the index. This parameter is mandatory!
            - useCompoundFile: advises lucene to use compound files for the index files
            - minMergeDocs: minimum number of nodes in an index until segments are merged
            - volatileIdleTime: idle time in seconds until the volatile index is
              moved to persistent index even though minMergeDocs is not reached.
            - maxMergeDocs: maximum number of nodes in segments that will be merged
            - mergeFactor: determines how often segment indices are merged
            - maxFieldLength: the number of words that are fulltext indexed at most per property.
            - bufferSize: maximum number of documents that are held in a pending
              queue until added to the index
            - cacheSize: size of the document number cache. This cache maps
              uuids to lucene document numbers
            - forceConsistencyCheck: runs a consistency check on every startup. If
              false, a consistency check is only performed when the search index
              detects a prior forced shutdown. This parameter only has an effect
              if 'enableConsistencyCheck' is set to 'true'.
            - enableConsistencyCheck: if set to 'true' a consistency check is
              performed depending on the parameter 'forceConsistencyCheck'. If
              set to 'false' no consistency check is performed on startup, even
              if a redo log had been applied.
            - autoRepair: errors detected by a consistency check are automatically
              repaired. If false, errors are only written to the log.
            - analyzer: class name of a lucene analyzer to use for fulltext indexing of text.
            - queryClass: class name that implements the javax.jcr.query.Query interface.
              this class must extend the class: org.apache.jackrabbit.core.query.AbstractQueryImpl
            - respectDocumentOrder: If true and the query does not contain an 'order by' clause,
              result nodes will be in document order. For better performance when queries return
              a lot of nodes set to 'false'.
            - resultFetchSize: The number of results the query handler should
              initially fetch when a query is executed.
              Default value: Integer.MAX_VALUE (-> all)
            - extractorPoolSize: defines the maximum number of background threads that are
              used to extract text from binary properties. If set to zero (default) no
              background threads are allocated and text extractors run in the current thread.
            - extractorTimeout: a text extractor is executed using a background thread if it
              doesn't finish within this timeout defined in milliseconds. This parameter has
              no effect if extractorPoolSize is zero.
            - extractorBackLogSize: the size of the extractor pool back log. If all threads in
              the pool are busy, incomming work is put into a wait queue. If the wait queue
              reaches the back log size incomming extractor work will not be queued anymore
              but will be executed with the current thread.
            - synonymProviderClass: the name of a class that implements
              org.apache.jackrabbit.core.query.lucene.SynonymProvider. The
              default value is null (-> not set).

            Note: all parameters (except path) in this SearchIndex config are default
            values and can be omitted.
        -->
        <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${rep.home}/index"/>
            <param name="useCompoundFile" value="true"/>
            <!-- Default is 100 -->
            <param name="minMergeDocs" value="500"/>
            <param name="maxMergeDocs" value="10000"/>
            <param name="volatileIdleTime" value="3"/>
            <!-- Default is 10: more segments quicker the indexing but slower the searching -->
            <param name="mergeFactor" value="10"/>
            <param name="maxFieldLength" value="10000"/>
            <!-- Default is 10 -->
            <param name="bufferSize" value="100"/>
            <param name="cacheSize" value="1000"/>
            <param name="forceConsistencyCheck" value="false"/>
            <param name="enableConsistencyCheck" value="true"/>
            <param name="autoRepair" value="true"/>
            <param name="analyzer" value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
            <param name="queryClass" value="org.apache.jackrabbit.core.query.QueryImpl"/>
            <param name="respectDocumentOrder" value="false"/>
            <param name="resultFetchSize" value="700"/>
            <param name="supportHighlighting" value="true"/>
            <!--
            Use 5 background threads for text extraction that takes more than 100 milliseconds
            -->
            <param name="extractorPoolSize" value="5"/>
            <param name="extractorTimeout" value="100"/>
            <!-- Default is 100 -->
            <param name="extractorBackLogSize" value="500"/>
            <!-- Indexing configuration -->
            <!--<param name="indexingConfiguration" value="${rep.home}/index/index_config.xml"/>-->
        </SearchIndex>
    </Workspace>

    <!--
        Configures the versioning
    -->
    <Versioning rootPath="${rep.home}/version">
        <!--
            Configures the filesystem to use for versioning for the respective
            persistence manager
        -->
        <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
            <param name="path" value="${rep.home}/version"/>
        </FileSystem>

        <!--
            Configures the perisistence manager to be used for persisting version state.
            Please note that the current versioning implementation is based on
            a 'normal' persistence manager, but this could change in future
            implementations.
        -->
        <!--We do not use versionning-->
        <PersistenceManager class="org.apache.jackrabbit.core.persistence.mem.InMemPersistenceManager">
            <param name="persistent" value="false"/>
        </PersistenceManager>
    </Versioning>

    <!-- Clustering configuration -->
    <!--
    <Cluster id="node1">
        <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
            <param name="revision" value="${rep.home}/revision.log"/>
            <param name="driver" value="org.postgresql.Driver"/>
            <param name="url"
                   value="jdbc:postgresql://localhost:5432/artifactory"/>
            <param name="user" value="artifactory_user"/>
            <param name="password" value="password"/>
        </Journal>
    </Cluster>
    -->

</Repository>
4. Create Artifactory database using following script-
          CREATE ROLE artifactory_user LOGIN PASSWORD 'password' NOINHERIT VALID UNTIL 'infinity';

    CREATE DATABASE artifactory WITH ENCODING='UTF8' OWNER=artifactory_user;


5. Copy PostgreSQL JDBC driver to $TOMCAT_HOME/lib directory.

6. Restart Tomcat and we are good to go.

Following two steps are optional.

7. Do the LDAP (Active Directory in our case) integration using "Admin > Security > LDAP Settings" screen.

8. Using "Admin > Security > Groups" screen create a group say 'developers' and check the option to automatically join all new users to this group. Now whenever someone tries to login, Artifactory will delegate the task of authentication to LDAP and on successful authentication it will create an account (if not already exist) and include that user in this group.

Now all installation and configuration steps are complete, I proceeded for a project build using Artifactory repository URL http://<server>:<port>/artifactory/repo/. Oops! Maven was not able to get any artifacts :-(, when looked into Artifactory logs, there was a following error message-

2009-07-17 17:17:56,932 [WARN ] (o.a.r.RemoteRepoBase:199) - repo1: Error in getting information for 'org/apache/maven/plugins/maven-compiler-plugin/2.0.2/maven-compiler-plugin-2.0.2.pom' (Failed retrieving resource from http://repo1.maven.org/maven2/org/apache/maven/plugins/maven-compiler-plugin/2.0.2/maven-compiler-plugin-2.0.2.pom: repo1.maven.org).

What could be the reason??? Error message does not give much information. Well server was behind a proxy and Artifactory needs to be told to use the proxy server for internet access. After configuring a proxy, one need to edit the configuration of all remote repositories individually to use the proxy. IMO it would have been better to have an option during proxy configuration to declare that as global proxy to be used by all repositories.

Now artifactory is working fine and taken some workload off from my busy schedule, all thanks to the great work being done by JFrog guys.