Monday, February 26, 2024

Jacobin JVM at 30 Months

This month, the Jacobin JVM project reaches the 30-month milestone, with release 0.4.0. Because for the last six months Richard Elkins (@texadactyl) and I have been working together daily on features, we've made good progress. Our goal is before year-end to have it run a standard set of benchmarks. After that, we'll begin to ask for volunteers to test Jacobin with their code. As ever, the larger goal is to deliver a more-than-minimal JVM written entirely in a single language (go).

To be honest, Jacobin is already much more than minimal, but we want to get it closer to feature parity with the HotSpot JVM, which is the JVM that ships in OpenJDK. During the past six months, we've added:

* exception handling, both caught and uncaught exceptions and errors. For uncaught errors, we try to provide somewhat more detail about the exception than does the HotSpot JVM. However, for users who prefer HotSpot's exact wording, we provide the -strictJDK command-line option, which uses the exact same wording as HotSpot. 

* improved diagnostic data in trace logs. Prior to this release, out trace logs were focused on the bytecode instructions, showing the class, method, bytecode and the top of the operand stack (TOS). We now print out the entire operand stack with each bytecode instruction so that we can watch data items move up and down the stack as pushes and pops move them. While this generates huge trace listings, it lets us watch the execution of classes in a real-time document. 

* handling methods with a variable number of arguments

static initializer blocks. Initiatlizer blocks are rarely used by developers, but crucial to the operation of the JVM. At the language level, they're blocks of code between {{ and }} or in freestanding blocks of code between marked static{ ...code here... }. They're most often used to initialize static variables. The code blocks are executed before any code in a class, even before a constructor. Inside the JVM, they appear when classes use static variables, which means frequently. And they can entail complex chain reactions in which they need to instantiate other classes and run their static initializer blocks. 

* revised architecture. One of the confounding aspects of working on a system with so many discrete subsystems that must all interoperate in a carefully choreographed process is that it's difficult to anticipate the exact shape and interfaces a subsystem must have when it's first designed. In part, that's because we generally cannot implement all the features right away--only the essential ones. Gradually, as Jacobin moves forward, earlier decisions to not include certain lesser-used features need to be revised. In this release, we revised how we look up methods and how we handle static variables. In both cases, we simplified existing code. 

Hacker News

Jacobin JVM made the front page of Hacker News. That post by Ye Lin Aug, generated 184 interesting comments. We appreciated this unexpected coverage and did our best to answer the many questions. 

What's next

In the next six month sprint, we are hopeful that we can:

* implement all remaining bytecodes except INVOKEDYNAMIC, which will surely take us longer to complete

* implement java.lang.Class: there are several Java classes that are so dependent on the JVM's design that every JVM needs to implement them by hand. These include classes for threads, debugging classes, and, of course, java.lang.Class...among others.

* add file I/O libraries (it might seem odd to see this here, but the JDK's file I/O libraries are native functions. We need to implement then in go. This will primarily be via use of the Facade design pattern, but there will likely be some additional coding required.)

* expanded work on handling JAR files. Presently Jacobin does handle JAR files. However, we want to make sure that code is robust enough to handle all details and forms of JAR files, so that execution never fails.

All of this in preparation for running benchmark suites and, eventually, soliciting alpha testers.

In the above text, I've referred to this milestone as a  "release." The term is misleading. We're not creating a release, but just marking the code at this 2.5-year anniversary as v. 0.4.0. As discussed on the GitHub project site, we don't yet recommend you try Jacobin. However, by the end of the next sprint, we hope to start inviting folks to give it a try. 

Testing

As discussed in previous posts, we're deeply committed to testing. Jacobin's test suites currently run a total of 708 tests, which include 597 unit tests and 111 integration tests. We'll be boosting these number significantly in preparation for inviting alpha testers. 

Jacobin by the Numbers

At present, Jacobin consists of a production codebase of 15,814 lines (includes code, comments, and blank lines). The testing code consists of 24,178 lines plus 26,874 lines in the Jacotest test suite. This gives 50,912 lines of tests, which is 3.22x the size of the production code. Our eventual goal is a significantly greater multiple. 

If you'd like to show your support for Jacobin JVM, we'd love a ⭐ on GitHub. That helps keep our motivation high! If you want more frequent updates, please follow us on Twitter (@jacobin_jvm)


Wednesday, August 09, 2023

Jacobin at the 2-year Mark

Jacobin (a JVM written entirely in Go) just reached its 2-year anniversary. Since our 18-month update, a lot has happened. We have:

  • Added instantiation of non-static classes
  • Added support for superclasses
  • Implemented the JDK’s native math libraries in Go
  • Added support for multidimensional arrays
  • Added support for compact strings
  • The interpreter now handles 190 bytecodes (out of 203)
  • Default to using the classes and libraries bunded with the OpenJDK
  • Significant instruction-level tracing capabilities (see below)

What we’re working on now and taking up shortly:

  • Making sure that our test suites generate the same results as the OpenJDK JVM
  • Adding the final bytecodes to the interpreter. (Some of these are very complicated, so they will likely take a while.)
  • Add exception handling
  • Add support for interfaces

Even before these goals are attained, we expect that to start running benchmarks and third-party test suites on Jacobin.

Much of the good progress we’ve made since our 18-month update is due to the addition of Richard Elkins (@texadactyl) to the team. He implemented the JDK’s native math libraries and has created a test suite, Jacotest, which grinds on existing and upcoming features.

Tracing and Peering into the JVM

Our progress remains very much aligned with the original goals for Jacobin: a JVM capable of running Java17 programs, written entirely in Go with no dependencies, delivered as a small executable from a cohesive, extensively commented codebase.

At present, Jacobin is a 3.9MB executable that is tested daily on Windows, Linux, and MacOS. Because it’s a single codebase, we have the pleasure of loading it into our IDE (GoLand, kindly provided by JetBrains) and stepping through the execution of a class bytecode-by-bytecode following the execution path across classes and libraries.

To give us a roadmap, we expanded our already detailed instruction tracing to show the values on the operand stack and other useful details. Here is a sample of the tracing log (available by specifying the -trace:inst option on the command line):

 

java/lang/StringLatin1 meth: inflate    PC:  30, GOTO       TOS:  - 

java/lang/StringLatin1 meth: inflate    PC:   3, ILOAD      TOS:  - 

java/lang/StringLatin1 meth: inflate    PC:   5, ILOAD      TOS:  0 int64 22 

java/lang/StringLatin1 meth: inflate    PC:   7, IF_ICMPGE  TOS:  1 int64 22 

java/lang/StringLatin1 meth: inflate    PC:  33, RETURN     TOS:  - 

java/lang/StringLatin1 meth: toChars    PC:  14, ALOAD_1    TOS:  - 

java/lang/StringLatin1 meth: toChars    PC:  15, ARETURN    TOS:  0 Object  

java/lang/String       meth: toCharArray PC: 14, GOTO       TOS:  0 Object  

java/lang/String       meth: toCharArray PC: 24, ARETURN    TOS:  0 Object  

main                   meth: main       PC:  41, ASTORE     TOS:  0 Object: &{{68288800 0} <nil> [{[I 0xc000004450}]}

main                   meth: main       PC:  43, GETSTATIC  TOS:  - 

 

(Some entries removed for simplicity.) In this listing, you see on the extreme left, the class name, the method name, the program counter (PC, which is the number of the bytecode being executed), the bytecode, and the value on the top of the stack (TOS). In this, TOS: 0 means there is one item on the stack (at position 0) and its type and value are shown immediately to the right (or on the next line in case of line wrapping).  

Notice that in this excerpt, execution starts in java.lang.StringLatin1/inflate(), eventually returns to the calling function in java.lang.String, toCharArray(). When this completes, it returns to the main method in the class called main. which is loaded with a pointer to an object that consists of an array of integers (in this particular case, an array of chars that form a string)

Testing

As stated in our previous posts, we’re deeply committed to testing. Currently, Jacobin uses a testbed of 618 tests: 525 unit tests and additional 93 tests in the Jacotest suite. Even at this level, we’re not satisfied with the depth of coverage, and we expect to continue expanding the testing aggressively.

By the Numbers

Jacobin consists of 11,097 lines (this includes code, comments, and blank lines). The 525 unit tests represent 21,465 lines. The Jacotest suite consists of and additional 21,921 lines (mostly Java). This totals to 43,386 lines of testing code, which means our test code is currently 3.91x the size of our production code. We aim to increase that ratio as we move forward.

So, where do we stand?

We’re not quite ready for users to begin testing Jacobin. In this coming year, we aim to ship a release that you can try out and test with your own Java classes. At that point, we’ll pivot to improving performance. (If you want to jump the gun, though, you can always download the code and do a build. Instructions on the release page.)

If you want to help the project, we’d love a star on GitHub (this helps keeps our motivation high) and perhaps let others know about the project.

Tuesday, February 14, 2023

Jacobin JVM at 18 months

Earlier this month, the Jacobin JVM project (a JVM written in Go) reached its 18-month milestone. Since our post at the 12-month mark, we have added support for numerous Java bytecodes to the interpreter, including all the bytecodes for longs, floats, doubles and their operations, all the bit manipulations, and all operations on single-dimensional arrays of primitives. We've implemented 176 bytecodes at present and expect to finish up the remaining ones we need during the coming six months.

At present, Jacobin can execute simple static classes, which is enough to allow us to test functionality and to begin running benchmarks. While performance has not in any way been a goal during our work, as we get closer to finishing the interpreter, it will assume greater importance. @suresk is already sketching out an observability client, similar to VisualVM and other tools, to guide our optimization work. 

Jacobin continues to meet our initial goals: it is written entirely in Go and has no dependencies. It runs fast and the executable is only 3.1MB (on Windows). It runs Java class files and JARs compiled by Java 7 through Java 17.

By the numbers

Jacobin's codebase consists of 25,813 lines (which include code, comments, and blank lines). As mentioned in earlier posts, we have a very deep commitment to testing as shown by the fact that this codebase includes 18,015 lines of testing code for the 7,798 of production code. This is a ratio of testing code to production code of 2.31x -- our highest to date (as we set out to do in earlier posts). Those 18K lines represent 429 unit and integration tests.

Easy Things You Can Do to Help

While Jacobin is still in pre-alpha mode, if you choose to build it or run one of the posted executables on GitHub, we’d love your feedback. We respond quickly to any and all feedback and questions. In this regard, Richard Elkins (@texadactyl) deserves our heartfelt thanks for running Jacobin on various test files and sharing his results with us.

If you’d just like to show your support for the project, we'd love a star on GitHub. Knowing people are interested in Jacobin really helps keep our motivation and spirits high. If you're on Twitter, please follow our handle (@jacobin_jvm) to keep abreast of what we’re doing.



Tuesday, August 09, 2022

Jacobin JVM At The 1-year Mark

 After 12 months, Jacobin, the more than minimalist JVM written in Go, has come quite far. Presently, it can execute simple Java classes and JARs and can do several interesting things, described shortly. The source code, which is available under the Mozilla open source licence and housed on GitHub, contains several sample Java classes that demonstrate the kinds of classes Jacobin can execute accurately and quickly.

Jacobin responds to most of the options listed in java -help, including supporting a range of verbosity options that can log considerable data to the console as the program is running. For a huge amount of output, use -verbose:finest switch. You can even do instruction-level tracing with the -trace:inst command-line switch.

Jacobin is a single executable with no dependencies. It requires only a JDK distribution on the local machine. Any JDK through Java 17 will work.

Under the covers, Jacobin—like OpenJDK-based JVMs—loads some 1,400 classes in the background. These comprise all the basic classes of the Java distribution. The class loaders in Jacobin perform a detailed parse and format-check of the app classes, with linking and preparation done on-the-fly at execution time.

What’s Next?

The team of Spencer Uresk (@suresk) and Andrew Binstock (@platypusguy) are working primarily in the following areas: completion of the bytecode interpreter (mostly to be completed by Andrew) and designing and developing an observability client, mostly by Spencer. (Observability is the ability to see what’s happening inside the JVM.) The README page on GitHub gives the current status of the various subsystems under development.

By the Numbers

After one year, the Jacobin codebase consists of 21,051 lines (including comments and blank lines). Of those, 14,499 lines make up 291 tests, meaning that the testing code is presently 2.21x the size of the production code. We strive to increase that multiple. The unit tests cover 72% of the production code, while the integration tests cover even more.

This deep commitment to testing is crucial to advancing the project. To move beyond running just the simplest of classes, Jacobin must adopt a lot of the inner complexity of the JVM. Debugging the interactions of many interlocking parts is nobody’s idea of fun. So, for our own peace of mind, we invest heavily in making sure that the code we write works exactly as we expect. And, of course, this also leads to a good user experience.

Easy Things You Can Do to Help

While Jacobin is still in pre-alpha mode, if you choose to build it or run one of the posted executables on GitHub, we’d love your feedback. We respond quickly to any and all feedback and questions. If, instead, you’d just like to show your support for the project, we'd love a star on GitHub. Knowing people are interested in Jacobin really helps keep our motivation and spirits high. If you're on Twitter, please follow our handle (@jacobin_jvm) to keep abreast of what we’re doing.

Thank you for your interest. Onward to year 2!

Sunday, May 08, 2022

Jacobin JVM project after nine months

After nine months, Jacobin has been steadily moving forward. The biggest news of this quarter is that Spencer Uresk (@suresk on Twitter and suresk on GitHub) has joined the project. He's made his presence felt right away by rewriting how Jacobin loads JDK classes at start-up. Previously, we provided a curated set of JDK classes in the Jacobin distribution. These classes were searched for in the directory specified by JACOBIN_HOME. Spencer's improvement is that the classes are now loaded directly from the Java distribution on the runtime system. This means that if your system already has a JDK installed on it, all you need to run Jacobin is the single Jacobin executable file. Spencer has now turned his attention to running JAR files (because at present, Jacobin runs only individual class files).

While Spencer's working on that, I (Andrew Binstock) am continuing the work on the bytecode interpreter. Work is slow but steady and I aim to have it mostly complete by the end of this three-month cycle.

As of this quarter, we are compatible with classes through Java 17 (previously only through Java 11). We don't enforce sealed classes (Java 17's big new feature) but we can execute our test classes from Java 17 just fine.

By the numbers

Project size has risen from 17,588 lines in our codebase to 19,173, which consists of 6,383 lines of production code and 12,970 lines in 248 tests--a 2.003x ratio of test code to production code. We'll look to increase that ratio as we move forward. (It was at 1.4x at the three-month mark, and 2.10x at the six-month mark.)

How you can help

If you're interested in this project and you're on Github, we'd love a star. Knowing people are interested in Jacobin really helps keep our motivation and spirits high. If you're on Twitter, follow our handle (@jacobin_jvm). Thanks for your interest and support!

Friday, February 04, 2022

Jacobin JVM project after six months

After six months, Jacobin can now execute many of the most common bytecode instructions. Simple classes that use for-loops, call methods that compute values, and print the results to the screen work correctly. Several of these are now available in the testdata directory on GitHub. For example, Hello3.class performs the following:

public static void main( String[] args) {
int x;
for( int i = 0; i < 10; i++) {
    x = addTwo(i, i-1);
    System.out.println( x );
}
  }

  static int addTwo(int j, int k) {
int m = multTwo(j, k);
return m+1;
  }

  static int multTwo(int m, int n){
return m*n;
  }
}

What you're seeing is a loop that calls a method, which in turn calls another method. If you run this without any command-line options, it will print out the expected result (a series of integers ranging from 1 to 73). 

If you run it with -verbose:finest, Jacobin will present a wealth of information about what's going on inside the JVM. You can also do instruction-level tracing with -trace:inst. The last few lines of the instruction trace look like this:

class: Hello3, meth: main, pc: 20, inst: INVOKEVIRTUAL, tos: 1
class: Hello3, meth: main, pc: 23, inst: IINC, tos: -1
class: Hello3, meth: main, pc: 26, inst: GOTO, tos: -1
class: Hello3, meth: main, pc: 2, inst: ILOAD_2, tos: -1
class: Hello3, meth: main, pc: 3, inst: BIPUSH, tos: 0
class: Hello3, meth: main, pc: 5, inst: IF_ICMPGE, tos: 1
class: Hello3, meth: main, pc: 29, inst: RETURN, tos: -1

The class and method fields are self-explanatory. pc refers to the location of the bytecode instruction, inst: refers to the actual instruction, and tos: represents the top of the operand stack before the instruction (-1 = empty stack, 0 = 1 item on the stack, etc.)

When you run Jacobin, it loads some 1500 Java classes in the background (just like the JVM does). If you want to see the list of these classes, run Jacobin with the -verbose:class command-line option. (A tribute to both the go language implementation and the design of Java classes is that these 1500 classes can be read, parsed, and posted in less than 300ms.) 

Current work

We're presently working on object creation. (So far, all the test classes don't require the creation of new objects.) The next step will then be handling exceptions, and then running classes that are in separate source files. As we create test classes for these developments, we'll perforce be adding new bytecode instructions to our interpreter. 

By the numbers

As of February 1 (six months into the project) the project spans 52 files consisting of 17,588 lines. Our pipeline consists of 223 tests and 11,922 lines (code and data) making our testing corpus 2.10x the size of our production code. We'll be looking to increase this ratio going forward. (It was 1.4x at the three-month mark.)

How you can help

If you're interested in this project and you're on Github, we'd love a star. Knowing people are interested in Jacobin really helps keep our motivation and spirits high. If you're on Twitter, follow our handle (@jacobin_jvm). Thanks for your interest and support!



Friday, December 17, 2021

How the Jacobin JVM Accesses Methods

Executing methods is the principal activity of the JVM. There are many steps involved in finding, loading, executing methods correctly. The Jacobin JVM uses a variety of techniques to accelerate this process as described here. (To follow, you need to know just a little Java.) 

Methods in class files

Java methods are stored in class files in a section where various kinds of class attributes are located. Each method contains instructions in the form of Java bytecodes. It also contains a series of attributes that provide additional execution information (such as data for handling exceptions, debugging info, etc.) Functions are stored by name and type, which are represented by indexes into an area of the class file called the constant pool. Those indexes ultimately point to strings in UTF-8 format (actually, a Java-specific variant of UTF-8). A typical example looks like this:

java/io/PrintStream.println:(I)V

This shows the usual println() method that prints an integer to the console. Note that the name of the class precedes the method name. The class name has transformed the usual . into forward slashes. The single dot demarks the method name, which is followed by a colon and the method signature. The part in parentheses indicates the parameter type (I=integer) and the V after the closing parenthesis indicates the return value, which here is void (V=void). Note to Java nerds: the signature of a method typically does not include the return value. It's specified here so that the JVM knows what to expect as a return value.

Extracting methods for use by the JVM

The classloader is a JVM subsystem that locates classes needed by the application, parses them, and places (or loads) the parsed data into an important area of the JVM called the method area. The method area, despite its name, holds entire classes. When an app requires a method, it looks into the method area and determines whether the class has been loaded. If not, it asks the classloader subsystem to locate and load the class into the method area. Once the class is there, the JVM looks through all the method, resolves the name and signature strings for each of the methods and sees whether they match the method being looked for. When a match is found, the bytecodes are loaded and executed. (If the method is not found, a runtime error results.)

This search can be extremely expensive. For example, the Java standard Class.class in Java 11 has 139 methods--that's potentially a lot of look-ups! To save time, most JVMs, including Jacobin, cache the method data once it's been looked up, so that the search is performed only once.

The Method Table

In Jacobin, the caching is done using a method table (see file MTable.go). When a method is invoked, Jacobin (like many JVMs), first checks the method table to see whether the method has previously been located and loaded. If not, then the search as described previously is performed. In Jacobin, the method is located and stored in the method table and then the look-up in the table is performed a second time and the result passed to the calling method.  

Additional Considerations

Thread safety: The method table, like the method area, is a JVM-wide data structure. That is, all executing threads in the JVM can access it. As a result, it's conceivable that two threads would be updating the method table simultaneously. To avoid this problem, the table uses a mutex lock on every update. 

Performance: While developing the many capabilities of a JVM, Jacobin is aiming for acceptable performance. Eventually, though we'll be working very hard to maximize performance. Some of the techniques we have in our notebooks for future enhancements (some of which are used in other JVMs):

  • For the main() class and other classes that might appear in the same JAR, loading the methods directly into the method table, rather than waiting for the initial method search to load them. 
  • When a method is loaded into the method table, deleting it from the class entry in the method area. There is no need to have the same data in memory twice. Doing this, reduces the memory footprint of the JVM.
  • When a class's methods are searched for a match (that is, prior to an entry in the method table), if no match is found in the class, then the superclass must be checked. If that fails, then that super-class's superclass is checked and so on up the chain until java.lang.Object is reached at the top of the object hierarchy. A simple optimization is to give every loaded class a complete list of all the superclass methods with pointers to them, so that the JVM does not have to climb the hierarchy in its search, but can tell quickly whether the method exists or not. 

There are surely other optimizations and refinements, which we hope to explore and to include if they lead to better execution.



Tuesday, November 02, 2021

Jacobin JVM project after three months

Development on Jacobin, the JVM written in go that supports Java 11, has been proceeding rapidly. In the 100 days since the beginning of the project, there have been 314 pushed commits. I'll give more stats below. Here's where we stand:

Jacobin can read, parse, format check, and load class files. This process happens very quickly. For example, running all these steps on one of the largest classes in the JDK distribution, BigDecimal.class, takes just 2ms. When parsed, BigDecimal has 1567 entries in its constant pool, 37 fields, and 167 methods. That's a huge class! 

When a class is loaded by Jacobin or any other JVM, it necessarily pulls in other classes to be loaded. For example, all classes run from the command-line have a superclass. Often, that superclass is java.lang.Object, which depends on other classes. Among these are java.lang.Class and java.lang.String; various I/O classes are needed as well. The OpenJDK-based JVMs (essentially, all JVMs except IBM's J9 and some embeddable VMs) address this need by preloading hundreds of widely used classes at JVM start-up. For a look at the list of all the classes loaded just to display the JVM version info, run this from the command line:

java -verbose:class -version

On my Java 11 test system, this command preloads 381 classes (in 347ms!) While Jacobin does not need as many classes loaded to run the specified class, it needs a subset of them. The next step in the project is to identify the required classes and load them quickly. To this end, loading opertions (parsing and format checking) will need to be done in parallel. Fortunately, one of the go language's strengths is a rich set of easy-to-use resources for precisely this kind of concurrent operation.

After this task is completed, work will begin on execution. 

Testing Thoroughly

One of the principal goals of Jacobin is to be a reliable JVM. This requires disciplined work in the planning, development, and testing. Development is based entirely in tasks which are logged in a cloud-instance of JetBrains' excellent tool, YouTrack (graciously provide for free). You can see the presence of this tracking, in that every commit on GitHub starts with the corresponding task name. (Presently, the most recent task is JACOBIN-89.) Quality of the code is reviewed by automatic linters on GitHub. Currently, the code merits an A+. The goreport badge on the jacobin GitHub project, takes you to the most recent report.

Testing is done on a near-fanatical basis. Let me explain:

In 2005, I was a contractor with Agitar, a now-shuttered company that made a tool which would read a Java codebase and generate unit tests for missing areas of coverage. It worked great. In conversations with their sales engineers, they told me they used a back-of-the-envelope calculation to assess a company's commitment to testing. They compared the size of the test codebase to the production code. If the test codebase was 50% the size, the company had some commitment to testing. Over 80% was a clear and strong commitment to testing, and over 100% meant a deeply engrained testing culture. 

The current code base of Jacobin consists of 8,342 lines (includes: code, comments, blank lines). Of those, 4,718 lines are in tests. That is, the testing codebase is 130.2% the size of the production code. The goal is to get that ratio even higher. Future quarterly updates will reveal our success in this effort.

Want to help?

It's always great to know a project is interesting to others. If Jacobin is interests you and you want to encourage its progress, a GitHub star is our preference. If you want to participate more directly, let me know in the comments, which are kept private. We also love code reviews, suggestions, and later on, we'll surely need folks to do testing. Whatever your interest, thanks for your time!


 




Thursday, August 05, 2021

A Whole New Project: A JVM

Ever since I started out in programming, I've wanted to undertake a programming project that was developed with the rigorous approach used in mission-critical software: write out the requirements; enforce traceability between requirements between requirements, code, and tests; and, of course, do rigorous testing. 

The main problem has been finding the time to dedicate to such a project. There is a reason that the agile movement eschews this approach: it is the opposite of agility--it relies on an unchanging product definition, relies on extensive documentation, and does not accept the concepts of failing fast and releasing often. It's a whole different mindset to "fail never and release when ready."

In the light of these constraints, the ideal project is one with a well-defined set of specifications. I've decided to meet that need by writing a simplified version one of my favorite pieces of software: the Java Virtual Machine (JVM). 

The specs for much of the JVM are published in detail and updated by the Java team at Oracle with every new release. You can find them here. On the basis of these docs alone, the JVM is the best documented virtual machine in commercial use. There are many additional resources available, such as the excellent articles by Ben Evans and Aleksey Shipilev (both of Red Hat) on how the innards of the JVM work. And, I should add the source code to the JVM is publicly available

My project is entitled Jacobin and can be accessed at jacobin.org, which for the time being (and possibly permanently) points to the Jacobin project page on GitHub. There you'll find a detailed write-up of the project status.

Choosing a Language

I have spent the last eight months researching the JVM--reading the docs and articles and doing exploratory coding in various languages with which to write the Jacobin JVM. My requirements for the implementation language are simple enough: it must have decent tools and a viable ecosystem, it must compile to native code on the three major platforms (Windows, Mac, and Linux), and it must have built-in garbage collection (GC). The latter requirement is important. The JVM performs garbage collection, but I don't want to write a garbage collector. They are exceedingly difficult tools to write and, especially, to debug. By using a language that does its own GC, a huge amount of work has been removed from the project.

Three languages meet my requirements: Dart, Swift, and Go. I've written several thousand lines of code in the first two and have eliminated them from consideration. Here is why. Dart is a lovely language, but it's slow (even when compiled to binaries), its ecosystem is wanting, and the kind of threading it does is a poor match to the JVM. The problem with the ecosystem is exemplified by the nearly complete absence books on the language since Dart 2.0 came out a few years ago. Almost all written tutorials are way out of date. Those that are current focus, without exception, on Flutter--the UI toolkit that dominates the use cases for Dart. As a result, it's not easy to learn Dart in depth unless you want to focus primarily on Flutter. The Dart team should really address this. As to the threading model, it is based entirely on single-channel message passing: there is no shared memory. The JVM must perforce share memory between threads and so even if Dart were faster and the docs were up-to-date, it would not meet my needs.

Swift is a truly beautiful language. It's rich in features and has a lot of the type-checking and code safety rules of Rust, but without the endless head-banging that Rust entails. I would have loved to write the JVM in Swift, but it has several drawbacks: it doesn't run on Windows and its libraries are intimately tied to the Mac. Let me clarify. There is an official version of Swift for Windows, but it's maintained entirely by a single engineer at Google. There are effectively no docs for this version and the installation instructions don't work no matter how much tweaking and configuration I have done. The second problem is that while Swift is trying to become a language that works beyond just Apple platforms (for example, it runs fine on Linux), this worthy goal is far from especially when it comes to libraries. Consider that the equivalent of libzip (which is a core library in most languges--it is used to compress/decompress data using the zip format) is maintained by a third party on Github on a project that has at present 22 stars. The collections library has at most a handful of basic data structures, etc. Unless I want to write many of these libraries myself--which I have no desire to do--I am forced down the same road as Node developers: grabbing bits of functionality here and there from different contributors, many of which have unknown code quality. The alternative is to use Apple's Cocoa frameworks on the Mac, which would make my project Mac-only. In sum, until Swift grows its non-Mac ecosystem, it's not a viable option for this project--much to my chagrin.

This leaves Go, which is an easy-to-learn language that runs well on the major platforms and has a flourishing set of libraries, many of which are maintained by core Go developers. While it checks all the boxes, it presents its own challenges. For example, it's the only one of the languages that is not object-oriented and the transition from thinking in objects (after all, Java is my home language, so to speak) to using an imperative style of coding requires some rewiring of how I approach problems. In addition, the standard Go tools have weaknesses. For example, the testing framework is minimal--there is nothing like JUnit in terms of range of features. In the language itself, return values for errors and the lack of generics both feel a little crude, especially to someone coming from Java. Nonetheless, it looks like the best option for my project.

There was one other language candidate: Java. That is, write a JVM that runs on the JVM. I don't find this interesting at all. The code for the JVM is currently mostly written in Java and I'll be consulting it frequently--so what would I do then? Cut and paste? Rewrite the code in my preferred style? It's hard to see how that's an advantage.

What's Next?

In the next few months, I'll continue writing requirements and traceability docs and work through various Go books to transition from beginner Gopher to advanced, so that coding can proceed apace, rather than through constant searches. By that time, I should be in good position to rewrite the 2500 lines of Java-bytecode parsing routines I wrote in Swift, finish that parser, and then begin working on building the execution environment. 

In my next blog post, I'll write about the benefits of such a project and how personal projects like this deliver unexpected rewards.

In the meantime, if you want to show your interest or support, follow the project on GitHub or give it a star, so that I know I'm not working alone in a dark alley.



Thursday, November 18, 2010

The Most Important Book of The Year


Continuous Delivery
by Jez Humble and David Farley


I have reviewed many books on this website and I have gone through numerous others as part of my work on the Jolt Awards, but it’s been a very long time since I’ve read a book as useful and likely game-changing as Continuous Delivery.

The basic premise of the book is that we need to move past continuous integration into a fuller cycle of activities that go beyond build and test. Specifically, this new orientation calls for building and testing on all platforms, creating and deploying the final deliverables for all platforms—with every check-in. The benefit of this approach is that the development organization at any given moment always has: 1) immediate feedback on deployment issues, 2) a deployable binary; 3) a completely automated process to build, test, and deploy on all platforms.

This simple concept—a kind of continuous integration on mega steroids—has profound repercussions, all of which make your process better. The first and most important is that you have to automate everything downstream from the coding. And the authors mean everything. The most common point where people hem and haw about automation is deployment. But Humble and Farley make it clear you have to “bring that pain forward,” and fix the process so it can be automated. (If you don’t have any idea how you might refine and automate deployment, think virtualization. Can you emulate your current systems on virtual machines and then progressively simplify deployment of the software to the point of automation? Good, you’re on your way.)

But the mechanics of deployment may be the least of your challenges (And here, the book’s name could be viewed as misleading: Deployment is only one aspect it covers.) You also have to build, run, and test the software on every platform you ship on. You’re not reasonably going to be able to do that if you have to change configurations and manually reset values for different platforms. The authors guide you to finding the one path that gets you across the river Jordan without spending 40 years in the desert of bit twiddling. The key is to use a single codebase and move the platform dependent stuff into configuration files. This is non-trivial, but the authors offer plenty of good advice.

Testing is another topic Humble and Farley explore in great depth. Testing in the context of continuous delivery is not just running unit tests and a regression suite. No , this is running all tests—unit, integration, UAT, and so on. How to automate them effectively occupies probably the largest chunk of the book. Even if you don’t accept the continuous delivery concept, this section is worth the price of admission. It’s mind-expanding, in ways that the hundreds of articles we’ve all read about agile testing on Digg and Reddit never touch on. You see very quickly how much more automation you could do and how to get from your miserable semi-manual existence to the smooth flow of full and continuous automation.

What impresses about the book is how the authors consistently work through hard problems. They are not daunted by them and there is no attempt to pas over them with hand waving. Hard things are examined in detail with a perspective that derives from the authors’ own extensive experience.

I have literally never read a better book on process. I believe that going forward, this book will redefine agile process and CI; and it will have as much influence as--I have to go back to 1999, here--Fowler’s book on Refactoring did on code.

Monday, October 25, 2010

Bluebeam's PDF Creation Tool Suite

I use a variety of PDF tools in my editorial work. I frequently create, mark up, manipulate, and combine PDFs. In addition, I contribute to the open source Platpus typesetting project, whose major output format is PDFs. And the PDF plugin is my specific bailiwick. So, over the years, I've come to know a thing or two about PDFs, as well as the limitations of PDF tools.

The standard for PDF tools has been Adobe's Acrobat suite. But this suite is expensive, somewhat quirky, and at times works poorly with other tools. Acrobat plugins to Microsoft Office and Internet Explorer are especially unreliable, and they frequently make their host programs behave erratically. I always uninstall them.

This means I need to use other options to convert Word documents to PDF. There are several common solutions out there, none but one of them is completely satisfactory. For example, the Microsoft Office PDF plugin does not embed all fonts, nor does it give you the option to do so. It does not embed the Base14 fonts.

This is a design error (that is common). Here is its history. For many years, Adobe guaranteed that Adobe Acrobat Reader would provide 14 fonts (the so-called Base14 fonts) in all implementations. These fonts were Times Roman, Courier, and Helvetica typefaces (each in regular, bold, italic, and bold italic—so 12 fonts) plus a Symbol and a Dingbat font. The rule was you did not need to embed these fonts in PDF documents, because Acrobat Reader would supply them. This scheme never worked very well. Its first limitation was that not all Times Roman fonts looked the same, so the same document could look strikingly different on two different computers. A few years ago, Adobe quietly discontinued supporting Base14 fonts in Acrobat Reader. The result is that if you're creating a PDF for distribution, you must embed all fonts, even the old Base14 fonts, if you want it to maintain your original format and layout.

The Microsoft Office plugin does not have this option, so as a result PDFs you generate with it are not guaranteed to look correct on other systems. And, in fact, they frequently do not.

The PDF generator that come with Adobe Acrobat (not the Reader, but the paid tools) works better. It does offer an option to embed all fonts. However, in Word documents with many links, it fails to identify all links. And so rather than be clickable, the links show up as pure text.

To remedy this, I tested various Word-to-PDF tools and found none that consistently met all requirements until I ran into Bluebeam PDF Revu, a tool I had not previously heard of.

The first thing I noticed was that Bluebeam's plugins were stable and they worked correctly. The second thing I discovered was that Revu found all links in documents and by default, it embedded all fonts. So far, so good. The attention to small details in its PDFs are part of Bluebeam's DNA—it was designed as a tool for CAD users, so correctly rendering every detail of a document is a specialty.

Like the Adobe Acrobat toolbox, Revu provides editing capabilities, with better text mark-up tools than Acrobat. It also enables you to construct your own menu of tools for faster access to frequently performed operations. Form handling, digital signatures, etc. work exactly as expected. Multi-document processing can also be automated with the product. Adobe Acrobat Pro—the comparable offering from Adobe—retails at $449 list, and $350 at Amazon. The academic version of Acrobat can be found for the same price as the full Bluebeam Revu ($149) product. So, if you want the full range of options, better implemented than in Adobe's offering, and at a lower price, have a look at Bluebeam PDF Revu. (They offer a 30-day free trial.)

Thursday, February 04, 2010

Keeping LOC and Tests in Balance


The proliferation of metrics in software development threatens to take important quantitative measures and bury them beneath an avalanche of noisy numbers. Consequently, it's important to look for certain ratios and trends among the numbers to inform you whether a project is healthy. One tell-tale relation links LOCs and number of tests. These two values should grow in direct proportion to each other.

The included diagram presents the ratio of these two values for Platypus, the OSS project I work on.

As you can see, except for a few dips here and there, these numbers have stayed in lock step for the last 18 months. And, as you might expect, code coverage from these tests has similarly remained in fairly narrow range--right around 60%.

The most typical violation of this ratio is, as you would guess, a jump in LOCs without a corresponding rise in tests. This is something managers should watch out for. With a good dashboard, they can tell early on when these trend lines diverge. This is frequently, but not always, always indicative of a problem. (For example, it could be that a lot of code without tests was imported to the project.) Whatever the cause is, managers need to find out and respond accordingly.

(For the record, the tests counted in this diagram include unit tests and functional tests.)

Sunday, November 22, 2009

The Limitations of TDD

During the last 12-18 months, TDD has broken into the mainstream, it seems. And now, we're starting to see some backlash, as its limitations become better understood. Here is a sample discussion from Artima.com. Cédric Beust, who wrote the commentary, is not some unknown guy with a weird name. He wrote the TestNG unit testing framework, which is second only to JUnit in popularity. He also wrote the book, Next Generation Java Testing, which is probably the best book on pragmatic software testing that I've read in a long time. Here goes...

> That's an interesting point. Are you, in effect, saying
> that unit testing is overly emphasized, and at the expense
> of other forms of testing?


This has also been my experience, although to be honest, I see this problem more in agile/XP literature than in the real world.

This is the reason why I claim that:

- TDD encourages micro-design over macro-design
- TDD generates code churn

If you obsessively do TDD, you write tests for code that you are pretty much guaranteed to throw away. And when you do that, you will have to refactor your tests or rewrite them completely. Whether this refactoring can be done automatically or not is beside the point: you are in effect creating more work for yourself.

When I start solving a problem, I like to iterate two or three times on my code before I'm comfortable enough to write a test.

Another important point is that unit tests are a convenience for *you*, the developer, while functional tests are important for your *users*. When I have limited time, I always give priority to writing functional tests. Your duty is to your users, not to your test coverage tools.

You also bring up another interesting point: overtesting can lead to paralysis. I can imagine reaching a point where you don't want to modify your code because you will have too many tests to update (especially in dynamically typed languages, where you can't use tools that will automate this refactoring for you). The lesson here is to do your best so that your tests don't overlap.

--Cedric Beust

Tuesday, August 04, 2009

My Interview with Alexander Stepanov and Paul McJones

InformIT.com has posted my interview with Alexander Stepanov (of STL fame) and his co-author Paul McJones. Their just-released book, Elements of Programming, tries to map algorithm implementations back to symbolic logic and algebraic theorems, thereby--in theory--improving their design and correctness.

In the discussion, we broach many topics that derive from this approach to programming.

Saturday, July 25, 2009

Groovy Books

I have been using Groovy to write functional tests for Platypus, the open-source typesetting project I work on. I am likely to make Groovy the default scripting language for Platypus in the next milestone. In the process, I've had to come up to speed on Groovy and I've been reading through and looking over the various Groovy titles on the market. Here's my take.


The Groovy bible today, without the slightest doubt, is Groovy in Action which at 650+ pages is also the most detailed book. Its principal limitation is that Groovy has undergone several revisions since it came out. Because of this, a second edition is being written. Early access to e-drafts of that edition are available here, although little as yet has been published.

If you'd like a shorter and more up-to-date introduction to Groovy, I recommend Programming Groovyby Venkat Subramaniam. At less than 300 pages, it's a quick read, provides all the needed info quickly, and covers all the highlights, with a good balance of detail.

Many people consider Grails to be the killer app for Groovy. It's a web framework that rides above Spring and Hibernate and removes much of the complexity of using those components. If you are learning Groovy to use Grails, then Beginning Groovy and Grailsis an excellent choice. It's clear, approachable, and teaches you enough Groovy to be able to follow the tutorial on Grails.

Once you get comfortable with basic Groovy, you'll quickly find yourself pining for a book of recipes that shows you how to quickly get basic tasks done using Groovy metaphors. There are two somewhat flawed recipe books on the market. The first is Groovy Recipesfrom Scott Davis, a well-regarded lecturer in the Groovy area. While calling itself a recipe book, it frequently diverges into tutorials and odd humor--both of which are obstacles when trying to find information. Some important topics are not covered at all, such as testing--which is one of the major areas where Groovy benefits Java. Database access is also not covered. In other areas, Davis' explanations seem to lack an understanding of what the user would be looking for. Nonetheless, I have successfully used some of Davis' recipes in my work. A good alternative is Groovy and Grails Recipesfrom Bashar Abdul-Jawad. This title is a true recipe book and very readable. The Groovy portion is too short, however, and an important section on file recipes (which does appear in the Davis book) is omitted. However, if you're learning Groovy to get to Grails, this is the best choice. And Abdul-Jawad does a good job understanding what readers are looking for.

Ideally, O'Reilly would publish one of its trademark comprehensive recipes book and we could all settle on that. However, when I contacted O'Reilly about upcoming Groovy titles, the company indicated it had none in the immediate pipeline.

That's pretty much it for Groovy books; although there are several others that focus exclusively on Grails. One publisher, Apress, seems to dominate that Grails market. The two titles above that cover Grails are from Apress as is the Definitive Guide to Grails, written by Graeme Rocher, who designed Grails. In the past I've been skeptical of Apress books due to wide variations in their quality, but the Groovy/Grails titles I've examined have been consistently of high quality.

As Groovy gains a wider audience, I expect more titles to emerge from all the technical book publishers.




Wednesday, May 20, 2009

The Fan programming language: compile to Java and .NET

I have recently been playing with Fan, a programming language that reminds me a lot of Groovy, but has additional capabilities, such as actors. Its binaries run either on the JVM or .NET. Below is my recent column in SDTimes about the language. 

In recent times, we are seeing an extraordinary proliferation of new languages. On one hand, thousands of domain-specific languages (DSLs) have been spawned by the advent of tools that facilitate their creation. On the other hand, we find an equal surge in full-scale, general-purpose programming languages.

 The renaissance of these larger programming languages derives from several advances: 1) a renewed interest in dynamic languages and their benefits; 2) hardware that’s fast enough to run dynamic languages rapidly; and 3) the existence of two run-time environments—the JVM and the .NET CLR—that are widely used, well understood, and fast. As a result, we have an embarrassment of language choices that was inconceivable a decade ago.

In this column, I have previously highlight various interesting options among these languages: Ruby, Groovy, D, NetRexx, and a few others that elegantly address specific problems. Recently, I have been spending time with the Fan programming language, which while still early in its development cycle, is more finished and mature than most new languages at this point in their development.

Fan is a dynamic, OO language that runs on the JVM and the .NET CLR. It does this by generating intermediate code (called fcode) that is dynamically translated into Java bytecodes or a .NET DLL at startup. This step introduces a slight pause, after which programs run at full “native” speed for the given environment.

New languages arise because a developer needed to solve a problem that was not addressed well by common alternatives. The developers of Fan, a pair of brothers—Brian and Andy Frank—worked on embedded Java applications and found it difficult to sell the accompanying software to customers who were committed to Windows Mobile and .NET. So, they decided to write Fan to solve the problem and to keep it small enough that it could fit easily in a mobile device. 

In the process, they removed language verbosity and added features they wanted. Their vision is remarkably balanced and complete. The language, on the verge of a freezing its 1.0 features, offers: dynamic typing and/or strong typing (à la Groovy), closures and first-class functions, extensive concurrency support (thread-safe classes with immutability specified, threads with built-in message passing, and actors), and elegant handling of various namespace issues. Low-level features include default method parameters, nullable data types, built-in field accessors, unchecked-only exceptions, and simplified numerics. The numerics handle the overflow problem that is the favorite of language puzzle writers: all integers are longs and all floats are doubles. So either type uses 64-bits and effectively does not overflow. Chars are 16-bit UTF entities.

A particularly interesting aspect of Fan is the libraries. As Brian Frank told me, “Solving the JVM/CLR portability was the easy part. The hard part was what to do with the libraries and APIs.” What the brothers did was to rethink the API sets, eliminate cruft, and use a different concept of grouping. Whereas .NET and Java both use a large number of packages that include moderate numbers of classes, Fan uses few packages that contains large numbers of classes. The result is that a developer can almost always can guess correctly which package to link to for a specific need. In addition, Fan has sensible, built-in library defaults. For example, all files I/O defaults to buffered.

The good design of a language can take it only so far. To succeed, it needs good tools, good docs, and an active community. The language tools (compiler, etc.) are all open source and written in Fan. The code is clean and surprisingly readable. As to IDE support, there is currently a plugin for JetBrains IDEA and one in the very early stages for Eclipse . The Frank brothers do all their coding in regular text editors.

The documents are very good. Probably, the best I’ve seen for any new language at this point and far better than much older “new” languages, such as D. The website is well organized and elegant; and the tutorials and “cookbook” entries clean and plentiful. It’s difficult to assess language community size in general, but more so with Fan because it does not figure on Tiobe, due I suspect to the difficulty of teasing out data for a language named Fan. For this reason and for richer Google search results, there is a move afoot to change the name of the language. Nonetheless, the community is definitely small and active. The latter aspect due to the responsiveness of the Frank brothers to users’ questions, requests, and defect reports. 

Fan solves a lot of problems elegantly. If it continues growing as it has during the past year, I anticipate it will evolve into an attractive solution for some development organizations.

The biggest challenge right now is the early stage in which most IDE plugins are currently found. A second limitation, which is about to be fixed in the upcoming point release, is that libraries and binary modules are all placed by default in the same directory. The discussion on this point, found on the language's discussion boards, shows the attentive regard of the Frank brothers for their users as they kicked around various schemes, elicited comments, and posted thoughtful replies. It's one of the most spam-free, low-noise discussion groups I've been a part of in a long while. I expect good things from this language.

Monday, January 05, 2009

The Agile Rules in HP's Original Garage

According to a recent HP poster, these were the rules in Bill Hewlett and Dave Packard's famous garage:


  • Believe you can change the world.
  • Work quickly, keep the tools unlocked, work whenever.
  • Know when to work alone and when to work together.
  • Share tools, ideas. Trust your colleagues.
  • No Politics. No bureaucracy. (These are ridiculous in a garage).
  • The customer defines a job well done.
  • Radical ideas are not bad ideas.
  • Invent different ways of working.
  • Make a contribution every day. If it doesn’t contribute, it doesn’t leave the garage.
  • Believe that together we can do anything.
  • Invent.

  • Curiously, it sounds like something the agile guys might have written (had they not written the manifesto). I prefer this wording because of its greater applicability and more dynamic presentation.

    Thursday, November 13, 2008

    Bob Martin's "Clean Code" Reviewed

    I have gone through "Uncle Bob" Martin's new book, Clean Code,which is a lenthy presentation of rules that will help Java developers write better code. It's similar to Kent Beck's Implementation Patterns,except more code-fixated. Clean Code has some good points, but it contains several weaknesses that seem to have gone entirely by the reviewers on Amazon. So, here's the scoop.

    First of all, it's well hidden, but the book is only partially written by Bob Martin. Many chapters are written by other consultants who work at Martin's company--many of whom I've never heard of. The one stand-out exception is Michael Feathers, whose chapter on error handling is one of the clearest in the book. I wish he had written more.

    The main body consists primarily of explaining various coding rules that Martin calls heuristics and to which he assigns coded abbreviations for later reference. Alas, unlike patterns that have meaningful names as shortcuts, Martin chooses meaningless notations such as C2 and G26. So, "the function should do nothing but compacting[G30]" is a shortcut for the author, but a pain for the reader who has to cross-reference these references repeatedly to know what Martin is talking about.

    Unlike Beck's book, there is no theoretical framework to Martin's prescriptions. The book is a series of examples from which he teases this rule and that. Because of this lack of framework there is a certain desultory aspect--the rules come in seemingly random order.

    Some of them make you want to leap up and clap. For example, his rule that Javadoc should not contain HTML. How many times I've come to the same conclusion! I want to read comments in code easily. The small lift that HTML brings to Javadoc pages is not in anyway worth the difficulty it adds to the reading of comments in code. Bob Martin's one of the first persons I've encountered to say so unequivocally.

    Other rules are good, but later contradicted. For example, Martin states that you should never leave commented-out code in place. [C5] As he points out, no one knows why it's commented out and so it remains in place forever. However, later on in an example of refactoring code per his own rules, Martin comments out large blocks of code without an explanation of how that squares with his earlier advice. (p.374)

    Martin also uses questionable coding preferences. For example, all of his code uses indents of 2 columns. 2 columns? It makes every routine look like a solid chunk of code. It's clearly not a practice to be recommended.

    A large portion of the book is an example of Martin refactoring someone else's code. He takes a long piece from an OSS project and proceeds to "improve" it. I found this section uncompelling. Perhaps because in Fowler's masterpiece Refactoring,each refactoring magically transforms the code. By comparison, Bob Martin's work seems journeyman-like. I didn't find the initial code interesting nor did I find Martin's cleaned-up version luminous. I was expecting a before-and-after scenario that would make me sit up and take notice. Instead, the exercise felt preachy, condescending at times, and ultimately not terribly convincing.

    My last gripe addresses an inexcusable error: typos. There aren't many but they are frequent enough to be distracting. For example, Martin seemingly does not understand the difference between it's and its. (p. 272, p. 296, among others) And his code contains typos too. (p. 309). This carelessness erodes credibility. Books that preach quality should be flawless at the level of spelling and grammar.

    Overall, I think some organizations can use several of Martin's heuristics as a means of boosting their in-house coding standards. But I doubt that careful coders will find much of value. Those developers will be better served by Beck's Implementation Patterns,which is based on principles and so communicates much more information in fewer words. Since my review of Beck's book, I must confess my admiration for it has deepened, and it's the volume I would recommend if you're looking to write cleaner code.

    Sunday, September 28, 2008

    Banishing Return Status Codes

    The most enduringly popular post on this blog is Perfecting OO's Small Classes and Short Methods, which presents a short series of stringent guidelines to help an imperative-trained developer master OO.

    If I were to add one item to the list, it would be: Don't use return codes to indicate the status of an action. Developers trained in languages such as C have the habit of using return codes to indicate the success or the nature of failure of the work done by a function. This approach is used because of the lack of a structured exception mechanism. But when exceptions are part of the language, the use of status codes isa poor choice. Among the key reasons are: many status codes are easily ignored; developers will expect problems to be reported via the exception mechanism; exceptions are much more descriptive. And finally, exceptions enable return codes to be used for something useful--namely returning a data item.

    Astute readers will note that in Java, null is frequently used as a return value to indicate a problem (as in Collections). This practice subverts the previous points, and it too should be avoided. Returning a null presents code with many problems it should not have to face. The first is the risk of a null-pointer blow-up because the return value was accessed without being checked. This leads to the code bloat of endless null value checks. A much better solution, which avoids this problem, is to return an empty item (empty string, empty collection, etc.). This too communicates that no data item fulfilled the function's mandate, but it does not risk the null-pointer problem, and it frequently requires no special code to handle the error condition.

    Hence, if your OO code is characterized by heavy reliance on return codes (many of which I am certain are not checked), consider rewriting it in favor of exceptions and use return statements solely for returning non-null data items.

    Monday, September 01, 2008

    A Parameter-Validation Smell and a Solution

    Last week, Jeff Fredrick and I did a day-long code review of Platypus. We used a pair-programming approach, with Jeff driving and I helping with the navigation. Eventually, we got into the input parser, which parses input lines into a series of tokens: text, commmands, macros, and comments. Macros can require a second parsing pass, and commands often require additional parsing of parameters.

    Once you get a parser working well (that is, it passes unit and functional tests, and it handles errors robustly), you generally don't want to mess with refactoring it. Experience tells you that parsers have hideous code in them and wisdom tells you to leave it alone. However, we launched in.

    A frequent cause of otiose code was my extensive parameter checking. Parameters were validated at every step as tokens passed through multiple levels of parsing logic. Likewise, the movement of the parse point was updated multiple tiems as the logic resolved itself back up the processing stack. This too had to be validated repeatedly.

    Jeff came up with an elegant refactoring that I could not find in the usual sources. He created an inner class consisting of the passed variables, a few methods for validating them, and a few more methods for manipulating them.

    This class was then passed to the methods in lieu of the individual parameters--thereby reducing the number of parameters to one or two. And because the class constructor verified the initialization of the fields, I need only to check whether the passed class was null, rather than validate each of the internal fields.

    The effect was to reduce complexity of already complex code, enforce DRY, and place the validation of the variables inside a class that contained them--a set of small, but important improvements. And like many of the best refactorings, it seems obvious in retrospect.

    So, if you find your class's methods are repeatedly validating the same parameters, try bundling them in an inner class along with their validation logic. You'll like the results.

    Tuesday, June 03, 2008

    The Handiest Java Book in Years.


    One of the constant challenges I have as a Java developer is keeping up with the numerous good FOSS dev tools. I no sooner start testing one tool and adapting my project to it, when a new one comes along. Being an analyst and naturally curious, this new product (or new release) represents a constant temptation. Is it better than what I am using? How much effort is required to try it out? What does it do better? On and on.

    I can put a lot of those concerns to rest now. I just received a copy of Java Power Tools from O'Reilly and it's exactly what I've been looking for. It contains deep explanations of the principal FOSS dev tools in 10 major categories. These explanations are not two- or four-page summaries, but in-depth expositions that provide crucial info on the strengths and weaknesses of the product. The author, John Smart, then provides detailed tutorial on using the product. It's clear he's spent lots of time exploring the dark corners of each tool. And he makes good use of that knowledge in his comparisons and comments on the products.

    If you want to spend an hour or so coming up to speed on what a product is about before installing it (and without having to work through the usually limited docs), this book will get you there faster and enable you get an overview of a whole lot of tools quickly and with the assurance you have a clear understanding. Here are the tools that are covered, followed by the number of pages for each one in parentheses:

    BUILD TOOLS: Ant (55), Maven (60)
    SCM: CVS (20), Subversion (78)
    CI: Continuum (24p) Cruise Control (19) LuntBuild (32) Hudson (19)
    IM: Openfire (12)
    UNIT TESTING: JUnit (20) TestNG (25) Cobertura (17)
    OTHER TESTING: StrutsTestCase (10) DbUnit (44p) JUnitPerf (10) JMeter (20) SoapUI (22) Selenium (30( Fest (9)
    PROFILING: with Sun tools (16) with Eclipse (15)
    DEFECT MANAGEMENT: Bugzilla (20) Trac (35)
    QUALITY: Checkstyle (20) PMD (18p) FindBugs (12) Jupiter (18) Mylyn (14p)

    All told, 856 pages of crisp, well-written explanations. A must-have reference for the bookshelf.

    Thursday, May 22, 2008

    Is the popularity of unit tests waning?

    Before getting into my concerns about whether unit testing's popularity has peaked, let me state that I think unit testing is the most important benefit wrought by the agile revolution. I agree that you can write perfectly good programs without unit tests (we did put man on the moon in 1969, after all), but for most programs of any size, you're likely to be far better off using unit tests than not.

    The problem is that only a small subset of developers understand that. And recent data points suggests that the number of programmers who use unit tests is not exactly growing quickly. I'll list some of the data points below that I've been developing for my column in SD Times.

    1) Commercial products on the wane. Agitar was a company whose entire fate was tied to the popularity of unit testing. Despite very good products, a free service to auto-generate unit tests for your code, and some terrific exponents (especially Alberto Savoia and Jeff Frederick) to tell their story, the company closed a down a few weeks ago, essentially having come to the conclusion that it could never be sold at a price that could repay investors. So rather than ask for more funding, it closed down. If unit testing were gaining popularity robustly, Agitar surely would have come to a different conclusion.

    2) Few OSS products. Except for the xUnit frameworks themselves, few FOSS tools for unit testing have been adopted. The innovative Jester project, which built a tool that looked for untested or poorly tested logic, essentially stopped development a long time ago because to quote the founder, Ivan Moore, in a comment to me "so few sites are into unit testing enough to care about perfecting their tests."

    3) Major Java instructors aren't teaching it. Consider this interview with Cay Horstmann, co-author of the excellent Core Java books. (He asks, "If so many experienced developers don't write unit tests, what does that say?" In speculating on an answer, he implies that good developers don't need unit tests. Ugh!)

    4) Unit testing books are few and far between. I am seeing about one new one a year. And as yet, not a single book on JUnit 4, which has been out for nearly three years(!).

    5) Alternative unit-testing frameworks, such as the excellent TestNG, are essentially completely invisible. I was at a session on scripting this spring at SD West and in a class of 30 or so, two people had heard of TestNG (the teacher and I).

    I could speculate on causes, but I have no clear culprit to point to. Certainly, unit testing needs to be evangelized more. And evangelized correctly. The folks who insist on 100% code coverage are making a useful tool unpalatable to serious programmers (as discussed here by Howard Lewis Ship, the inventor of Tapestry). But, I think the cause has to be something deeper than this. I would love to hear thoughts from readers in real-world situations where unit testing has been abandoned, cut back, or simply rejected--and why.

    It would be a shame to have unit testing disappear and its current users viewed as aging, pining developers hankering for a technology the world has largely passed by. That would return programmers to the tried-and-true practice of glassy-eyed staring at a debugger for hours--something I have not missed at all.