Code Spelunker: August 2012

Saturday, August 18, 2012

How to customize published artifacts from Gradle

I am really enjoying Gradle. It's probably a post for another day, but I haven't enjoyed a toolset this much since I discovered Google Guice years ago.

We have been slowly migrating our existing projects from Ant to Gradle, and encountered a few bumps on our way to customizing published build artifacts. In case anyone else is similarly stumped, here are my findings. We use Ivy for our dependency management, but Gradle abstracts many of the details, and for the purposes of this discussion it doesn't really matter if you're using Ivy or Maven. Also for the sake of simplicity, I'm assuming we using the basic resolvers and repo layout patterns.

There are three primary identifiers we are concerned with using dependency management. Maven calls them group, artifact, and version; Ivy calls them organisation, module, and revision, but we're really talking about the same thing.

The Basics

On publish, Gradle defaults the artifact name to the project name, the version name to the string "unspecified," and leaves the group name blank. So that means the simplest possible upload task:

 apply plugin: 'java' 
      
 uploadArchives {  
      repositories {  
           ivy {  
                url 'c:/temp/local'  
           }  
      }  
 }

produces an artifact that looks like this:

 Published projectA to C:\temp\local//projectA/unspecified/projectA-unspecified.jar  
 Published ivy to C:\temp\local//projectA/unspecified/ivy-unspecified.xml

Probably not what we'd typically want. Fortunately, customizing group and version is easy. They are both properties on the Project instance:

 apply plugin: 'java'       
   
 group = 'com.example'  
 version = '1.0'  
   
 uploadArchives {  
      repositories {  
           ivy {  
                url 'c:/temp/local'  
           }  
      }  
 }

This produces:

 Published projectA to C:\temp\local/com.example/projectA/1.0/projectA-1.0.jar  
 Published ivy to C:\temp\local/com.example/projectA/1.0/ivy-1.0.xml

This should cover the vast majority of publishing use cases. Generally these primary three values are all you need to publish your artifact.

Specialty fields

There do exist specialty uses cases where additional fields would be useful; for example, publishing the source code with the binary, or indicating that a jar targets a specific JDK. For these purposes, Maven introduced (and Ivy supports) an additional property known as a classifier. Gradle adds another, an appendix, which can optionally be appended to the artifact name.

These are not fields on the project object and so cannot be set in the same way as group and version. Instead, we set these properties on the archive creation task itself. In the case of a jar, it might look something like this:

 apply plugin: 'java'       
   
 group = 'com.example'  
 version = '1.0'  
   
 jar {  
      appendix = 'myAppendix'  
      classifier = 'myClassifier'  
 }  
   
 uploadArchives {  
      repositories {  
           ivy {  
                url 'c:/temp/local'  
           }  
      }  
 }

This produces the following output:

 Published projectA-myAppendix to C:\temp\local/com.example/projectA/1.0/projectA-myAppendix-1.0-myClassifier.jar  
 Published ivy to C:\temp\local/com.example/projectA/1.0/ivy-1.0.xml

As you can see, the Ivy file name is unaffected. Only the artifact itself takes on these parameters in its name. The Ivy file does, however, define the fields internally:

      <publications>  
           <artifact name="projectA-myAppendix" type="jar" ext="jar" conf="archives,runtime" m:classifier="myClassifier"/>  
      </publications>

Branching

A common use case in software development is to create feature branches in source control for long-lived or high impact changes to a code base, and being able to publish branched artifacts separate from the mainline of development can be very useful. Using the appendex or classifier field for this purpose is inappropriate because as we saw above, they are intended for multiple related artifacts. The basic path to the artifact as well as the ivy.xml descriptor file remain the same. For a branch, you'd need both of these to be variable.

Ivy supports publishing artifacts in branches by providing an attribute on publishing, but Gradle does not. (I'm not sure about Maven, but some quick Googling suggests it doesn't either). This creates a challenge: how can we easily publish branches in Gradle?

Vary the module name

One option is to vary the module name. Instead of publishing an artifact as "projectA," publish it as "project-A-branch." This is a problem in Gradle because unlike the group and version fields, the name of the project is immutable (a problem for which an issue has been logged).

The suggested work around is to leverage the Gradle Settings object--which exists prior to any project--and modify the project names there. Any dependent sub-projects would have to be modified as well. So a sample settings.gradle file might look like this:

 // hack to support branching on module names, see GRADLE-2412  
 try {  
      rootProject.name += "-" + branch  
      rootProject.children.each {  
           it.name += "-" + branch  
      }  
 } catch (MissingPropertyException e) {  
      // there is no way to test for the existence of the branch property  
      //  so we swallow the exception if it doesn't exist, and publish  
      //  any artifacts with the default project name  
 }

Here I've made my branch configurable by allowing the user to pass a -P property on the command line. The hasProperty() function is not available to us in the Settings object, so we have to use try-catch in case the property does not actually exist.

When the Project objects are instantiated, they will be created with the newly modified branch name, and Gradle will publish the artifact with that name. However one other change has to be made for this to work, and that's any direct project dependencies would also need to be modified to use the new branch name.

For example, a dependency code block that reads:

 dependencies {  
      compile (  
           project(':projectB'),  
           project(':projectC'),  
      )  
 }

Would have to be changed to read something like:

 dependencies {  
      compile (  
           project(':projectB' + '-' + branch),  
           project(':projectC' + '-' + branch),  
      )  
 }

Vary the version

There is an easier way to have Gradle support a branch, and that's to vary the version instead. This is a simple matter of customizing the version number as we have seen already.

 version = hasProperty('publishVersion') ? publishVersion : "latest"

The user could then pass a branch name using the -P option. If your Ivy resolver is set up to use "latest" as a keyword for a constantly changing artifact (similar to SNAPSHOT in Maven), then simply appending that to the end of the branch name will retain that behavior.

Conclusion

Gradle has excellent integration with existing dependency management frameworks. It easily supports the most common use cases, and also is customizable for edge cases. Hopefully this has shed a little light on how to customize your published artifacts.

Monday, August 6, 2012

How valuable is documentation really?

"We have come to value ... working software over comprehensive documentation."

So states the Agile Manifesto. I was thinking about this when I ran across this article entitled, "I've Just Inherited an Application--Now What?" It's an essay on the value of documentation in understanding an unfamiliar system, and by that he means technical documentation: architectural-level diagrams, design documents, and so forth. (He even uses the term "code spelunking" which I promise I thought of before reading his blog!)

I admit to being a little perplexed by his premise. Most of my career has been involved in working on legacy applications of one form or another, and slowly over time I have come to the opposite conclusion. In most cases I benefited very little from existing documentation, and the value in it decreased inversely proportional to the complexity of the code base. Indeed, on a list of desired artifacts in code spelunking, documentation appears low on my list.

Don't misunderstand, that's not to say documentation can't be helpful. The Manifesto doesn't say any documentation, but comprehensive documentation; and its authors don't say it's not valuable at all, but simply that they value working software more. Documents that are well written and up-to-date can indeed save hours of effort trying to understand an application.

But that's really the problem, isn't it? How many times does the UML diagram you find on the company wiki--or worse, emailed to you by a junior developer--accurately represent the actual code base? And how many class or sequence diagrams have you examined that have so many lines, boxes, and arrows, that it's impossible to actually establish the relationships with any clarity?

Software is constantly mutating, assuming it isn't being end-of-lifed. Requirements change, technology evolves, developers join and leave the team; and most of the time you have limited warning in advance of these changes. I have had product owners swear on holy relics that a particular product requirement will remain static forever, only to have it change months later as market demands shift. I have been through migrations in database technology, application servers, build tools, and countless development frameworks. To be of any value in the future, documentation must be maintained in parallel with these changes.

Moreover documentation costs money. A handful of organizations may benefit from the employ of a technical writer skilled enough to both read the code and generate well formed explanations, but in most cases technical documentation is written and maintained by the people who write the code. That is, developers. The time a developer spends crafting a diagram describing a subsystem is time not spent developing working software.

And bless their hearts, they may be excellent coders but they tend to be terrible writers.*

So what do I propose for understanding a system? First, the code itself. The code should be its own documentation. Good comments, yes, but variable names, method names, and the very structure of the code should reveal its purpose. Actual code is less written than read, a lesson sometimes lost on those who create it. Applying and enforcing a common set of coding standards, conducting rigorous code reviews (preferably via paired programming), and fostering a commitment to technical excellence all go a long way toward creating code that is self-documenting.

However well-written the code may be, it can really only provide a snapshot view into an application, and usually higher levels of abstractions are helpful. For this purpose, tests serve a critical need as documentation on multiple levels. Unit tests, for example, should not only validate the code, but should be written as a set of uses cases for how a class or method is expected to behave. Automated acceptance tests that provide high-level descriptions of business logic are the gold standard of executable documentation. Following the principles of BDD, they should be written in such a way that any stakeholder can understand them. Their being automated means that they are always in sync, and always correct. A test failure means either the assumption of the behavior was wrong, or the code itself doesn't actually perform as expected.

In fairness, there are tools that can generate diagrams and documentation from the code (the original blog author's company makes some of them), and these can potentially have value in understanding an application's architecture, assuming the output is human consumable. In my experience, code that's too difficult to read will probably generate complex diagrams that are just as difficult to understand. But I'm willing to concede that if I'm code spelunking, more information is usually better than less.

The times I have found the most value in design artifacts are at the outset of projects. When greenfield code is being written, diagrams and documentation can help communicate ideas between developers and teams. A quick sequence diagram about how you expect components to work can be helpful in communicating exactly what you want written. However, the moment those designs become code, the design document itself becomes outdated. Automated testing should replace it as the primary documentation of the system.

And in the end, I think this is ultimately what that line that I quoted from the Manifesto means. Working software is its own best documentation.

*Irony alert: I'm a developer, and I'm writing a blog.

Sunday, August 5, 2012

An introduction

Hello and welcome! If you're here on this blog and you're the least bit interested in software development, then there may--may--be something for you here in future posts. And if you find any value in it, you might be a little bit curious about the guy who wrote it ... so ...

I'm Ryan. I'm a married father of two, a rabid Star Wars fan, and a software developer at Pearson Digital Learning in Chandler, Arizona.
I was young enough that I’m not sure exactly when the first time I actually used a computer was. But assuming you don’t count the Atari 2600 or Colecovision, it was probably the single Apple IIe we had in my elementary school class. And it was love at first sight. I clearly remember being 11-years-old when my father brought home a Compaq Portable, a suitcase-sized so-called “portable” computer with no hard drive and a tiny green-on-black screen, which his company loaned him on the weekend. Every moment with that machine was precious.

At 16 I landed a job at the now defunct software retailer Software, Etc., and that same year borrowed a book on C from the store (we were allowed to do that back in those days). I don’t even remember now in the pre-Internet days where I found a C compiler--maybe MS-DOS shipped with one--but my first program was a simple sales tax calculator.

Eventually I graduated from Brigham Young University with a degree in Computer Science, and spent the next couple of years experiencing the fantastic boom and bust of the dot com industry. Through my career, I’ve bounced through shops specializing in Visual Basic, C++, Perl, and PHP. But since 2005 I’ve spent most of my time in the Java world.

I’ve worked for startups and multi-billion dollar companies both, been laid off twice, and built software using waterfall and agile approaches. In short, computers and software have been part of my life for most of the last 25 years. It’s a career I knew I would want even as a child, and as a middle-aged adult I’m still thrilled to be a part of it.

So why am I blogging?

I'm not sure, really. This is actually my 2nd shot at keeping a blog. Several years ago I kept an internal blog at Pearson which I think two people ever read, and when one of them left the company I lost the motivation. In recent months, however, I've found myself wanting to do it again, mostly because I find that recording my experiments and thoughts in the "published" word helps me focus. My readership here may never grow even as large as the last, but hopefully some of my successes and failures will help someone else.

By the way, the term "code spelunker" is something I thought was original, but it turns out I was just Googling the wrong term. Nonetheless, I think it adequately describes what you'll find here.