Code Monkeyism

Programming is hard by Stephan Schmidt

Funny, some Rubyists are stupider than a piece of wood

This one for example. He proposes a simple way to make Java developers angry, by showing them some code. His example is

File.open('server.cert').readlines[1..-2].join.gsub(/\n/, '')

and throws a challenge:

Anyhoo, all you have to do now is find someone who uses a big stupid language and throw an example like above to their face and tell them to beat it. See if they can write it more elegantly using their language. The beauty of the trick is that there’s no way in this world that’s gonna happen.

My 20min Java version looks like this - more or less identical:

File.open("server.cert").readlines(1, -2).join().gsub("\n", "")

Poor blogger, another fanboy without a clue who confuses languages and API design - in this case method chaining.

Because there’s a very high probability that the other person, yes, the one who’s using a big stupid language, will get so angry and beat you up.

Nope, I just write the code in Java. That’s all.

(Beside that, the oneliner is hard to read, hard to maintain, hard to reuse. Fine if your the sole developer in a project, bad if there are 50 others who need to maintain your code. I’d prefer a CertStripper ;-)

(As a second side node, where is the fluent interface to google collections, I’ve needed one today)

Update: Someone wrote a comment to the linked blog post by copying parts of this post. And because someone asked: No this cannot be done with the JDK, therefor it’s an API problem. You need to write some code on your own to fluently wrap JDK classes. See the mentioned fluent interface link above to see how this can be done to an existing API.

Sharding destroys the goals of your relational database

Sharding does destroy your relational database - which is a good thing. The idea behind sharding is to distribute data to several databases based on certain criterias. This could for example be the primary key. All entities that keys begin with 1 go to one database, with 2 to another and so on (often modulo functions on the key are used, or groups based on business data like customer location, or function). Several reasons exists for sharding, the main two being better performance and lower impact of crashed databases - only persons with a name that starts with S will be affected by a database crash.

Relational databases were the tool of choice for several decades when it comes to data storage. But they do more than store data. Even reading operations can be split into several functions. There are at least three kinds of database read queries:

  • Data graph building queries: With these you get your data out of the database, customers together with adresses etc.
  • Aggregation queries: How many orders have been stored in the August, aggregated by product category
  • Search queries: Give me all customers who live in New York

Sharding now does away with the second and third query and reduces databases to data storage. Because the shards are different databases on different systems you can’t aggregate queries (compared to a cluster) without custom code across systems and you cannot search with one query (only several ones - one to each database). Databases have lead to the notion that search and retrieval are linked together and should be dealt together. Most people think as retrieval and search as the same thing. This has blocked development on technologies. Sharding, S3, Dynamo, Memcached have changed this preception recently. I’ve written about splitting search and retrieval in “The unholy legacy of databases”. There I quote Rickard from Qi4j fame:

Entities are really cool. We have decided to split the storage from the indexing/querying, sort of like how the internet works with websites vs Google, which makes it possible to implement really simple storages. Not having to deal with queries makes things a whole lot easier.

and have concluded

Free your mind! Storage and search are two different things, if you split them, you gain flexibility.

People talked about splitting storage and search for some time now. Search engines like Lucene have driven searching out of databases. But mainly the notion of store&search is prevalent. Sharding as a mechanism for more perfomance and lower risk will move into many web companies and reduce databases to storage mechanism and drop the aggreation (data warehouse and reporting) and search parts. Those can be better filled with real data warehouse servers like Mondrian and search services based on Lucene or semantic enginse like Sesame. And storage might move from databases to simple storages like Amazon Elastic Block Storage or JDBM.

Thanks for listening, and think about your databases.

400 reader milestone

After my 200 reader milestone in January and 300 in May, this blog has reached the 400 FeedBurner reader milestone. Thanks to all the regular readers of this blog for listening :-)

Who wins the Olympics?

Considering the fact that the swoosh is the most shown icon of the Olympics, guess who has won?

My MacBookPro is kaputt

… since several days :-(

Hg gets rebase

Hg gets rebase. Though from the comments it’s not yet up to git, add “Killer features for Git are git rebase [...] ;s/Git/Mercurial/g” to the Mercurial part of my comparison post and remove “[...] no rebase [...].”.

Response to the critique for my last post and OneElementIterator

I’ve wrote an update to the post where someone suggested in a trackback to use the JDK for an one element iterator.

I got interested in aa OneElementIterator, which optimized - not sure how fast try is - could look like this:

public class OneElementIterator[T] implements Iterator[T] {
  private T element;

  public OneElementIterator(T element) {
    this.element = element;
  }

  public boolean hasNext() {
    return element != null;
  }

  public T next() {
    try {
      return element;
    } finally {
      element = null;
    }
  }

Faster and shorter ideas?

Update: Remove got lost during cut & paste:

  public void remove() {
    // not supported, throw exception
    throw new UnsupportedOperationException("Remove not supported in OneElementIterator");
  }

And as Eugene noted next() should throw NoSuchElementException .

“For” hack with Option monad in Java

There has been some discussion going on in the blogosphere about monads, and especially about the Haskell Maybe monad or the Scala option class. Those are ways to prevent problems with NULL and NPEs that Java lacks. Java returns NULL form many methods to indicate failure or no result. Suppose we have a method which returns a name:

String name = getName("hello");
int length = name.length();

The problem with this method is that we don’t know if it returns null. So the developers needs to deal with length == null, though the compiler doesn’t force the developer to deal with the returned NULL. A lazy developer then leads to null pointer exceptions. Other languages deal in different ways with this problem. Groovy has safe operators and Nice has option types.

Some posts like the one from James show how to use options in Java. All have the problem that you need to unwrap the value inside the option / maybe. Scala ans Haskell do that automatically for you, with the case classes in Scala for example.

But there is a construct with syntactic sugar in Java which unwraps the value from inside another class: the for loop from Java 1.5.

Option[String] option = getName("hello");

for (String name: option) {
	// do something with name
}

To make this work we need our option class to implement Iterable.

public abstract class Option[T] implements Iterable[T] {
}

And the None and Some sub classes to return an empty iterator

public abstract class None[T] extends Option[T] {
  public Itertator[T] iterator() { return EMPTY_ITERATOR; }
}

or an iterator with one item.

public abstract class Some[T] extends Option[T] {
  public Itertator[T] iterator() {
    // or better use google-collections
    List[T] list = new ArrayList[T]();
    list.add(this.value);
    return list.iterator();
}

Then voila Java does the unwrapping for us and instead of

Option[String] option = getName("hello");
if (option instance of Some) {
    String name = ((Some) option).value();
} else { ... }

we can write (sacrificing the else):

for (String name: getName("hello")) {
	// do something with name
}

Thanks for listening.

Update: Completely ignoring the point of this post, Euxx posted a correction for the single element list creation I did.

  return Collections.singletonList(this.value).iterator();

Happens all the time in IT, people missing the point but nitpicking on something irrelevant. Other than that, if we start arguing performance, “The java.util.Collections class also have number of other utility methods that allow to shorten code like this and help to improve performance of your application.” I’d write (or reuse) a OneElementIterator something like this (could probably be optimized with some further thinking)

public class OneElementIterator[T] implements Iterator[T] {

	private boolean done = false;
	private T element;

	public OneElementIterator(T element) {
		this.element = element;
	}
	public boolean hasNext() {
		return ! done;
	}
	public T next() {
		done = true;
		return element;
	}
	public void remove() {
		// not supported, throw exception;
	}
}

(Or again using google collections)

“I can’t not notice the ugly use of ArrayList to create collection with a single element, but what made matter worse, is that author suggested to use 3rd party library to replace those 3 lines of code.”

As far as I know, the JDK does not help you very much with Iterators or Iterables as third party libraries do. So, yes, I’d suggest using a third party library to implement an Iterator/Iterable for Option.