Code Monkeyism

Programming is hard by Stephan Schmidt

Problems with Jersey, REST, JSON and UTF-8 [Update]

UTF-8 is always a problem. Unbelievable. 2008 and we still haven’t fixed this. One of my current projects is a Javascript frontend with a REST backend. The backend stores to MySQL (a famous UTF-8 trouble maker) and creates JSON to REST calls. The problems starts with UTF-8 characters. Somewhere in the callchain - as always - characters don’t get correctly written. MySQL and the JDBC driver should work, the JSP page is UTF-8 (@page and meta-equiv), jQuery - which does the AJAX - and JS do know UTF-8 and Jersey should be UTF-8 too. But with some experiments now I’m quite sure that Jersey (JSR 311 REST framework) is to blame. I’m not sure how to specify UTF-8, this

  @ProduceMime("text/plain;charset=UTF-8")

doesn’t help. Funny, every major project with several frameworks along the call chain and several languages (JS, C, Java) makes UTF-8 problems somehow. I’m so fed up with this, it’s 2008.

Update: Jersey uses InputStreams for all encodings, especially StringProvider is relevant to me (se above). Does this work with Unicode?

About the author: Stephan Schmidt is currently a team manager at ImmobilienScout24 in Berlin. Stephan has been working as a head of development and CTO. He has used a lot of different technologies in the last 20 years including Java, Rails and Python. Stephans main field of interest is maintainablity and productivity in software development. Want to know more? All views are only his own.

If you did like this article but you don't want to subscribe to new articles with your reader, you can follow me on Twitter or subscribe to new posts with your email:

Comments

[...] No signal, no noise. « Problems with Jersey, REST, JSON and UTF-8 [Update] [...]

Hi Stephan,

Yes, i think this is a problem with Jersey, thanks for reporting it.

The EG recently found this issue as well and we have updated the JSR-311 specification, version 0.7 [1], to state:

When writing responses, implementations SHOULD respect application-supplied character set
metadata and SHOULD use UTF-8 if a character set is not specified by the application or if the
application specifies a character set that is unsupported.

and this will be implemented in the 0.7 release of Jersey (scheduled for April 18th).

I should be able to provide you with a specific solution for StringProvider fairly quickly if you are happy working with the latest builds or the trunk.

Paul.

[1] https://jsr311.dev.java.net/

New medication like plavix….

Plavix verses generic. Plavix….

Leave a Reply