The Only Pattern for Data Access is - There Are No Patterns for Data Access

Over the years of software development, one thing that has eluded most developers is Data Access. Yes, I mean writing code that accesses a database from your application. It is an age old problem since the days when data storage and computing were invented.

We would have thought that by now we would have been able to come up with a clean universal pattern for data access such as the M-V-C pattern for User Interfaces or the Singleton pattern or the Factory pattern. Yet, today we find ourselves with a plethora of technologies and patterns that are constantly struggling to survive to the next release. Just ask yourself, how many iterations of the ORM paradigm have come and gone?

J2EE with EJBs and Hibernate must be the answer!

In the JAVA world the creation of EJBs, Hibernate and its adoption into the J2EE spec surely would have put an end to the Data Access Layer misery developers faced. But it soon got notorious for being overly complicated, bloated and expensive to develop and maintain. Developers found that it was easy to run queries or stored procedures straight against their databases and get their data directly. Many of them did so in shame and said to themselves, this is hack but ORM way is really the true way. It became obvious when they ran their applications that that their apps ran better, faster and were easier to read without using the full EJB functionality.

The notoriously heavy datasets

At one time it seemed like Microsoft with their DataSets had won over the Database access world by providing simple disconnected objects that were easily bound to UI’s and were able to do queries to the database or even in memory when they were loaded. But alas they became notoriously heavy, hard to maintain and transport and had to be revamped. Microsoft has dabbed with many iterations of Data Access Layers including Object Spaces, Active Recordsets, LINQ to SQL and Entity Framework. Some of these are still in the market trying to survive like LINQ to SQL, some have become part of history like Active Recordsets and some never made it into sight like Object Spaces and some claim to take over the future like Entity Framework!

Why do none of the Data Access Layer Patterns work?

In my opinion after having used and worked on many projects that use the above mentioned technologies, the only pattern I see is a pattern of failure. The reason is that these technologies try to mimic the database. They make it look like the user is abstracted from the database by doing what a database does instead of abstracting what a database should do. They completely miss the concept of abstraction. When I abstract something in my code, the job of that abstraction is to hide the internal complexity of something, not repeat the actual work of what I am abstracting.

Failed data access layers abstract the database by repeating the work of databases. That is the number one rule of failure. How so you maybe asking. You may say, I love writing my LINQ queries and not having to worry about SQL. The problem begins right there. ORM data access implementations have to implement their own query language! If you have to implement your own query language you are venturing into the complex world of set theory and relational tuples. You better be as good and as fast as a database engine on that or else you will soon be history.

The Pattern of Redundancy instead of Abstraction

Caching database data in memory is another database copy-cat feature. You better provide better caching, latency, and redundancy algorithms than the database or you are going to face the wrath of the Database Engine pretty soon! I doubt any Data Access technology can beat the caching of query execution plans and caching recently used data provided by the leading database engines such as the commercial engines like Oracle and SQL Server or even the free ones like MySql. Database caches work better even over the network in the long term than the most powerful Object caching providers I have worked with.

These features of database access patterns or technologies do not do justice to the database engines they try to tame and therefore soon get bypassed and blown away to their competitor: the no data access layer pattern.

What is the no data access layer pattern?

The no data access layer pattern is simple; access or persist your data through the native database provider, using the native query language of database engine (SQL). If you stick to this pattern you will be more successful than not in your applications! The only rule of abstraction that you should follow is use data access to populate only application objects that need to be populated or persisted from your application and you should be good. Nothing more nothing less!

Let’s take an oath to ourselves and say, Thou shall not be fooled by another Data Access Pattern or technology!”. It is like looking for the Abominable snowman. If there was any truth to it we would have found it by now.

kick it on DotNetKicks.com | Shout it!

Regional Conflicts

We have all come to love our IDE features and take them for granted nowadays. Features like intellisense, code-complete, refactoring and code organization and of course the use of Regions.

The #region and #endregion keywords basically provide the ability to hide code and collapse it into a short word or phrase provided at the beginning of the block with the IDE showing a plus sign to expand and collapse the region.

Eg: #region Public Methods

      #endregion

While regions are handy and have become popular, I use them myself occasionally. I don’t particularly fancy them, neither do they improve my code’s readability by much. I believe in keeping my code concise, using descriptive names for methods, classes, variables, etc. and keeping my classes short and easy to read. But sometimes a region for collapsing those using statements and grouping similar methods like constructors, private variables and properties, disposing, etc comes handy.

As regions help you “hide” your code some developers like to abuse it to hide shameful code.

Let me share with you an example I saw recently. Here the developer is mapping data from his objects to an external system based on keys. The code for this method went of for a few hundred lines of code having a mapping for every field the external system needed.

public string GetMapDataValue(string key, Account accountObject) 
{

switch (key)
{
#region Account Data

case "AccountFirstName":
case "FirstName":
return accountObject.FirstName;

case "AccountLastName":
return accountObject.LastName;

#endregion

#region Address Data

case "AddressLine1":
return accountObject.PrimaryAddress.Line1;

case "AddressLine2":
return accountObject.PrimaryAddress.Line2;

case "AddressCity":
return accountObject.PrimaryAddress.City;

case "AddressState":
return accountObject.PrimaryAddress.State;

case "AddressZip":
return accountObject.PrimaryAddress.Zip;

#endregion

……

}

}


After collapsing the regions, the code looks like this:



image



This clearly shows that the developer is trying to write separate methods for mapping different pieces of data but instead choose to shove all the code in a switch case statement broken up into different regions for each specific piece of information. It may look organized but in reality it’s a poorly designed function that has been “regionized” to look readable. A classic case of hiding your dirty laundry in the closet.



Another strange use of regions I have seen over the years since regions have been around is the use of regions as grouping methods as if they were writing classes. This is a classic example from my procedural programmer friends who really have not gotten on the Object Oriented boat that left the shore a long time ago!



Here is an example:



public class DataAccess
{
#region Account Data

public InsertAccount(Account account)
{
}

public DeleteAccount(long accountId)
{
...
}

public UpdateAccount(Account account)
{
...
}

#endregion

#region Order Data

public InsertOrder(Order order)
{
...
}

public DeleteOrder(long orderId)
{
...
}

public UpdateOrder(Order order)
{
...
}

#endregion

... you get the point ...
}


A class like this went on for the few hundred objects in the system that needed to be persisted and was no less than 10,000 lines of code! The developer boasted the code to his colleagues and marveled at it. I remember working on this code file. It was so big that even the intellisense started to choke up. When I typed ‘this’ and hit the period key it caused the machine to run on overdrive for a few seconds before the list showed up! Now that is some serious regioning!



I don't want to give directions on how to use regions but I will say that if you design your code well and make sure your program functionality is broken down in to understandable, re-usable and maintainable classes you will find that your code does not need to be regionalized much. Short concise classes broken down into short descriptive methods will take you far in keeping your code organized and readable. By default the IDE makes methods, classes, and namespaces into regions and that should be enough for the majority of the code you will write!

Var Wars – Abuse of the CSharp var

CSharp 3.0 introduced the var keyword for declaring variables without having to explicitly specify the type. This was done for using anonymous types returned from LINQ queries. Now, I am seeing many developers use it all over their code and think its a good thing.

Wasn’t C# developed to enforce strongly typed programming?

Var just makes it harder to infer the type when reading and maintaining code.

Take for example this LINQ Query:

var user = from u in Users where user.Status == UserStatus.Active
select u;
It is hard enough to infer the type from the above query, let alone infer the type from a statement like this:
var index = 1; (Is index an int, long?)
Just because a compiler can infer the type does not mean a human should have to be forced to do it when reading it. If we wrote code just to make compilers happy we should be writing Assembly. Why then use an Object Oriented Language? But since code is written for developers to be able to read and maintain it and not just for compilers to compile it, it should be as descriptive and readable as possible.

Some say it cuts down on repetitive code when used in a statement like this:

var user = new User();

For those who are so obsessed with cutting down characters and lines of code, here is a small piece of advice that will better save you coding time and characters: focus on better design and architecture. If you are so concerned about repetitive code – your energy will be well rewarded. Well designed code is always shorter, cleaner, easier to read and also works better!

Cutting down on type names is like removing the wrapper from a drink. You have to infer what is in it by drinking it. For those who don’t care about their calories go ahead, but for the rest us, please put the wrappers!

The only time you really have to use var is for anonymous types. There is no other way to get an anonymous type. For everything else a type name can and should be used for variable declaration.

So for those var abusers, spare the rest of us some inferring and just declare your variables with a type name.

Welcome

Welcome to my blog. You may have read some of my postings at www.orasissoftware.com/blog or StackOverflow.com talking about data access technologies, ORMs and code generation. I will continue to blog from now on on this site. I also will guest blog on http://www.wijix.com/ whose site and concept I really like; it has many useful, well written utilities and code snippets for programmer’s use.

Although I don’t want limit myself to blogging on any particular topic, I will promise one thing that you will find on this blog. Everything will be original, informative, unique in perspective and always with a critical eye. I like to raise the bar when it comes to software development. No company or technology is too big or too good to be criticized and challenged! So subscribe now and be part of the conversation!