<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: The unholy legacy of databases</title>
	<link>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/</link>
	<description>Productivity in software development</description>
	<pubDate>Fri, 25 Jul 2008 03:30:21 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.2</generator>
		<item>
		<title>By: Rickard</title>
		<link>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-84973</link>
		<dc:creator>Rickard</dc:creator>
		<pubDate>Mon, 05 May 2008 08:02:34 +0000</pubDate>
		<guid>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-84973</guid>
		<description>Part of the point of splitting storage and query is that it becomes easier to do cross-storage queries. If you store objects in many places, but index/query them in one (again, the website vs Google analogy), it becomes supertrivial to query stuff in different places. The LDAP database problem you outline is one of the cases I had in mind when I designed these API's in Qi4j, because I want to be able to do the same thing. In Qi4j our primary indexer is going to be Sesame2 (i.e. RDF), with SPARQL as the main query language (although it's usually hidden under a domain-oriented Java API). Will be very interesting to see how it works out.</description>
		<content:encoded><![CDATA[<p>Part of the point of splitting storage and query is that it becomes easier to do cross-storage queries. If you store objects in many places, but index/query them in one (again, the website vs Google analogy), it becomes supertrivial to query stuff in different places. The LDAP database problem you outline is one of the cases I had in mind when I designed these API&#8217;s in Qi4j, because I want to be able to do the same thing. In Qi4j our primary indexer is going to be Sesame2 (i.e. RDF), with SPARQL as the main query language (although it&#8217;s usually hidden under a domain-oriented Java API). Will be very interesting to see how it works out.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: stephan</title>
		<link>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-83912</link>
		<dc:creator>stephan</dc:creator>
		<pubDate>Sat, 03 May 2008 05:51:28 +0000</pubDate>
		<guid>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-83912</guid>
		<description>@Laird: Your query builder doesn't sound too ugly, but I haven't seen the code :-)

&lt;i&gt;"I was hoping that someone somewhere smarter than me had figured out how to bridge querying disparate systems in a better way."&lt;/i&gt;

Uh, smarter, than I'm most possibly not the right person.

Perhaps the joins are only needed for reporting, if that is the case it would be best to write the data also into a OLAP for reports.

Like http://mondrian.pentaho.org/</description>
		<content:encoded><![CDATA[<p>@Laird: Your query builder doesn&#8217;t sound too ugly, but I haven&#8217;t seen the code :-)</p>
<p><i>&#8220;I was hoping that someone somewhere smarter than me had figured out how to bridge querying disparate systems in a better way.&#8221;</i></p>
<p>Uh, smarter, than I&#8217;m most possibly not the right person.</p>
<p>Perhaps the joins are only needed for reporting, if that is the case it would be best to write the data also into a OLAP for reports.</p>
<p>Like <a href="http://mondrian.pentaho.org/" rel="nofollow">http://mondrian.pentaho.org/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Laird Nelson</title>
		<link>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-83613</link>
		<dc:creator>Laird Nelson</dc:creator>
		<pubDate>Fri, 02 May 2008 20:45:29 +0000</pubDate>
		<guid>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-83613</guid>
		<description>I surely don't.  :-)  My main problem is that I need both--the ability to join LDAP information (Student information, say), as it happens, with additional information relevant to it from a database (classes they're taking).

The decidedly brute-force, ugly, smelly, hairy, nasty and yet strangely cool approach I took at one point was to make a kind of query builder that, in conjunction with some simple lookup and filtering facilities implemented on Storage instances--but again, not true searches--whittled down the sets of items from each Storage to be combined (so the Student Storage was able to filter using a simple where clause/predicate, and the Class Storage was able to do the same).  Then I loaded those sets into a temporary database (H2--obviously could have been anything) and did the more complicated joining there (minus the simple where clauses/filters that were used to get me the candidate sets).  (A tip of my hat to a former colleague for first exploring this approach.)  The result, of course, was dog slow and could not be used on enormous datasets, but performance was never a priority and the client understood that they would pay dearly in performance costs for this approach.  I was hoping that someone somewhere smarter than me had figured out how to bridge querying disparate systems in a better way.

Thanks for the links to the papers; very interesting reading.</description>
		<content:encoded><![CDATA[<p>I surely don&#8217;t.  :-)  My main problem is that I need both&#8211;the ability to join LDAP information (Student information, say), as it happens, with additional information relevant to it from a database (classes they&#8217;re taking).</p>
<p>The decidedly brute-force, ugly, smelly, hairy, nasty and yet strangely cool approach I took at one point was to make a kind of query builder that, in conjunction with some simple lookup and filtering facilities implemented on Storage instances&#8211;but again, not true searches&#8211;whittled down the sets of items from each Storage to be combined (so the Student Storage was able to filter using a simple where clause/predicate, and the Class Storage was able to do the same).  Then I loaded those sets into a temporary database (H2&#8211;obviously could have been anything) and did the more complicated joining there (minus the simple where clauses/filters that were used to get me the candidate sets).  (A tip of my hat to a former colleague for first exploring this approach.)  The result, of course, was dog slow and could not be used on enormous datasets, but performance was never a priority and the client understood that they would pay dearly in performance costs for this approach.  I was hoping that someone somewhere smarter than me had figured out how to bridge querying disparate systems in a better way.</p>
<p>Thanks for the links to the papers; very interesting reading.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: stephan</title>
		<link>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-83600</link>
		<dc:creator>stephan</dc:creator>
		<pubDate>Fri, 02 May 2008 20:08:36 +0000</pubDate>
		<guid>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-83600</guid>
		<description>@Laird: I'm not sure if this is possible. You're giving stuff up when you try this approach. But you also gain something. I guess it depends on the application you have. If flexibility in the backend is needed, than this is a good approach. If a RDBMS is all you need, then this approach is overengineered.

Have you read the transaction apostate paper and the Amazon dynamo paper? Sometimes it seems it isn't even possible to have data on one machine to join it. 

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf

But if you have new insights and a solution to the problem of data joining of disparate stores, please drop me a line.</description>
		<content:encoded><![CDATA[<p>@Laird: I&#8217;m not sure if this is possible. You&#8217;re giving stuff up when you try this approach. But you also gain something. I guess it depends on the application you have. If flexibility in the backend is needed, than this is a good approach. If a RDBMS is all you need, then this approach is overengineered.</p>
<p>Have you read the transaction apostate paper and the Amazon dynamo paper? Sometimes it seems it isn&#8217;t even possible to have data on one machine to join it. </p>
<p><a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html" rel="nofollow">http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html</a></p>
<p><a href="http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf" rel="nofollow">http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf</a></p>
<p>But if you have new insights and a solution to the problem of data joining of disparate stores, please drop me a line.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Laird Nelson</title>
		<link>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-83575</link>
		<dc:creator>Laird Nelson</dc:creator>
		<pubDate>Fri, 02 May 2008 18:18:20 +0000</pubDate>
		<guid>http://stephan.reposita.org/archives/2008/04/30/the-unholy-legacy-of-databases/#comment-83575</guid>
		<description>I have tried this sort of thing as well and I really really like the concept.  The only thing that stops my poor, slow, small brain from really seeing it through to its logical conclusion is the sometimes-requirement to join attributes from one Thing stored with one Storage mechanism with the attributes of another Thing stored with another Storage mechanism.  Do you have suggestions here?

Obviously, a Compass/Lucene-type search handles a huge number of cases--people tend to like to search by keywords, and so a coarse-grained search/locate strategy like that makes a lot of sense.  But in some of the applications I work on, careful targeted queries that join bits of two entities together--a classic SQL join--are also needed.

Have you found a convenient way to expose a *common* SQL-like query mechanism across items that use different Search implementations?</description>
		<content:encoded><![CDATA[<p>I have tried this sort of thing as well and I really really like the concept.  The only thing that stops my poor, slow, small brain from really seeing it through to its logical conclusion is the sometimes-requirement to join attributes from one Thing stored with one Storage mechanism with the attributes of another Thing stored with another Storage mechanism.  Do you have suggestions here?</p>
<p>Obviously, a Compass/Lucene-type search handles a huge number of cases&#8211;people tend to like to search by keywords, and so a coarse-grained search/locate strategy like that makes a lot of sense.  But in some of the applications I work on, careful targeted queries that join bits of two entities together&#8211;a classic SQL join&#8211;are also needed.</p>
<p>Have you found a convenient way to expose a *common* SQL-like query mechanism across items that use different Search implementations?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
