Local Data Model for .NET Database Application

Wiktor Zychla [wzychla_at_ii.uni.wroc.pl]

Download the Local Data Model Generator source code.

1 Theory

There are many different methodologies of the database programming. In fact, the one universal approach simply does not exist. A professional developer should not be deceived by some common concepts that are usable in typical situations but can be painful in a specific context. Although there are some common clues on database programming, at the beginning we will look at the issue from the wider perspective.

Relational databases store the data in tables. When an application asks for the data, the data that matches the query is sent back to the application. What the developer should think about is “how the application should store the data locally so it can be effectively used”?

Before we actually try to answer the question we will talk about local data models. A local data model could be defined as the way the application stores the data in its internal data structures and the way it manipulates the data. The are 4 elements that form the local data model:

Let us assume that the local data model is class-oriented. By class-oriented data model I understand a data model in which there is a strict correspondence between data tables in a database and classes in a object model. For example if the database in class-oriented data model has two tables, T_CUSTOMERS and T_ORDERS then the application’s object model will contain two corresponding classes, C_Customer and C_Order. Following this assumption, let us talk about the local data model.

Amount of data

How much data should the application load from the database? Should it load all the data or should it load only the data that is required to build current view?

Data structures

What data structures should be used to store the data in memory? The brief answer would be some collections for sure. This leads to the common question: what kind of collection should be used?

The answer is: it is up to you. The plain array is useful because of fast access to consecutive elements. The ArrayList is useful because it can grow and shrink without much overhead. The Hashtable is useful because the data row’s key can be used as the hash key, providing fast access to data that is somehow related inside the database.

What about the structure of classes?

Communication strategy

How the application should inform the database server about the changes made to the data by the user?

Concurrency strategy

Should the application inform any concurrent instances about the changes made by the user? Or should it block the concurrent access to the data? There are many possible scenarios and some of them are supported directly by database systems. Above deliberations are of course not complete because respective parts of the data model can interact in many ways. For example we could imagine an application that loads a part of the data to a tree-like data structures, track changes in respective views only and disallows any concurrency.

How then the data model for a real application should be chosen? Well, it depends on many factors among which we mention:

The final decision should be discussed carefully because a wrong choice could lead to a real disaster, especially when the model must be revised in an advanced stage of the development.

2 Practice

When a project enters the implementation stage it is usually too late for experiments. A stable and tested solution is highly desirable. The model I will present below turned out to be working efficiently in several desktop applications and therefore it will surely be used in future. It is a demonstration of a general rule: the local data model of a database application can be developed from scratch to match specific purposes. The .NET developer is not restricted to the way .NET sees the database world.

Context

At first, let’s define the context we work in. The database consists in several dozen of tables. Estimation reveals that there will be no more than several hundred thousands of records in biggest tables. There will be also several tables containing only several dozens of records. All data will be loaded into memory at once.

The model is then usable for relatively small databases and does not scale well to big databases (millions of records).

Basics of the model

Because of assumptions we’ve taken, the application can load all the data into memory at the start time. The local data model will contain classes that correspond to database tables. For example, tables
CREATE TABLE [dbo].[S_CITIES] (
	[ID] [int] IDENTITY (1, 1) NOT NULL ,
	[name] [varchar] (28)
) ON [PRIMARY]

CREATE TABLE [dbo].[U_USERS] (
	[ID] [int] IDENTITY (1, 1) NOT NULL ,
	[ID_CITY] [int] NOT NULL ,
	[surname] [varchar] (28),
	[dateOfBirth] [datetime] NOT NULL 
) ON [PRIMARY]
will have a corresponding classes
public class CS_City
{
	public int ID;
	public string name;
		
	public override string ToString()
	{
		return this.name;
	}

	public CS_City()	{}
}

public class CU_User 
{
	public int ID;
	public int ID_CITY;
	public string surname;
	public DateTime dateOfBirth;
		
	public override string ToString()
	{
		return this.surname;
	}

	public CU_User()	{}
}
Look at the definitions above. The relations between tables are modelled but not handled directly. Note that ID_CITY is a foreign reference to S_CITIES table. To efficiently handle relations, we will use Hashtables to actually store the data
public class C_PData
{
	public static C_OHashTable hS_Cities =
		new C_OHashTable( typeof(CS_City), "S_CITIES", "ID", "Cities");

	public static C_OHashTable hU_Users =
		new C_OHashTable( typeof(CU_User), "U_USERS", "ID", "Users");

	private C_PData() {}
}
and fast Hashtable lookups to pick up related data
public class CU_User
{
	...
	public CS_City City
	{
		get
		{
			return (CS_City)C_PData.hS_Cities[ID_CITY];
		}
	}
}
(do not worry about C_OHashTable now, the class inherits directly from Hashtable and basically contains some additional information about the collection name, type of elements etc.)

This will work as expected. Having an element of type CU_User, we can ask for corresponding City:

CU_User user;
...
string cname = user.City.name; // get the name of user’s city
Any relations between tables can be handled in similar manner.

Loading the data from database

Note that both classes, CS_City and CU_User, have some common elements. It is then wise to write a base class that serves as the container for all these common elements.
public abstract class C_Base
{
	public int   ID;

	...		

	public override int GetHashCode()
	{
		return ID.GetHashCode();
	}
}

public class CU_User : C_Base { ...
public class CS_City : C_Base { ...
To load the data from the database, we will use one clever routine that will populate consecutive tables, using reflection to get the information about fields. All fields that should be initialized with database data must be previously marked with C_Field custom attribute to distinguish them from other fields that do not come from the database.
public class CU_User 
{
	[C_Field()]
	public int ID_CITY;
	[C_Field()]
	public string surname;
	[C_Field()]
	public DateTime dateOfBirth;
	
	...
}

public const string SSelect0 = "SELECT * FROM {0}";

public static void Init_Table( C_OHashTable hDict, bool bClearTable )
{
	SqlCommand     cmd;
	object         item;
	SqlDataReader r = null;
			
	cmd = new SqlCommand( String.Format( SSelect0, hDict.NameOfTable ), C_PData.SqlConn );			
	r = cmd.ExecuteReader();

	if ( bClearTable==true ) hDict.Clear();

	while ( r.Read() )
	{
		item = Activator.CreateInstance( hDict.ItemType );

		foreach ( FieldInfo fI in hDict.ItemType.GetFields() )
			foreach ( object oA in fI.GetCustomAttributes(true) )
				if ( oA is C_Field )
				{
					hDict.ItemType.InvokeMember( fI.Name, BindingFlags.SetField, null, item, 
							new object[] { r[fI.Name] } );
				}
		C_Base bItem   = (C_Base)item;
		bItem.ID_DB    = bItem.ID;
		bItem.modified = false;				
		hDict.Add ( bItem.ID, item );
	}
Look carefully what is going on here. We read all records from the table (SELECT * FROM …), then we step through consecutive rows and try to pick from the SqlDataReader these columns that correspond to fields marked by C_Field in class definition.

This way the Init_Table method is polymorphic - it only needs a reference to the collection and then it can populate the collection with the data from the database.

public static void InitAllData()
{
	...
	C_PDataInits.Init_Table ( C_PData.hS_Cities, true );  
	C_PDataInits.Init_Table ( C_PData.hU_Users, true );  
	...	

Updating the data

Dynamic updating the data is a heart of a database application. The user adds, deletes and modifies the data during his session. Modifications should not destroy any existing data.

In our model, the changes will not be sent to the server immediately. They will be rather accumulated and sent to the server at wish as one big portion of information.

The most interesting and difficult part is the dynamic conversation with the database server. Feedback is necessary because the server assigns IDs to newly created rows. These newly assigned IDs must be at once used at client side to store the other newly created elements. What we need is a clever strategy that would allow to gather the information about new IDs from server and use it at client side.

A newly created ID for element E, returned by the server can be of course propagated to all elements that are in relation with E. This, however, would be extremely inefficient. Imagine that in one session 100 new cities would be added and 1000 new users that come from these new cities. After each city would be stored in the database, its new ID should be propagated to each customer that comes from that city (so that this customer could be stored in the database with proper ID_CITY value). It would either require to loop through the customers collection 100 times (to find customers from each new city) or hold an array of backward references from cites to its customers.

We will take another approach. Instead of updating the newly assigned ID, we will have TWO IDs in each element!

public abstract class C_Base
{
	[C_Field()]
	public int   ID;
	public int   ID_DB;
	public bool  modified;
	...
The first one, ID, will be used as the client ID. I call it client ID because it is used to link related elements at the client-side. The second one, ID_DB, will be dynamically assigned to new records by the database server and will be used in SQL queries by referenced elements.

New records at client side

When the new element would be created at the client side, it would get the new, fresh ID.
public abstract class C_Base
{
	...
	static int   newID = 0;
	public static int NewID 
	{ 
		get
		{
			return newID--; 
		}
	}	
Such element would be then added to the proper collection, using this new, fresh ID as the hash key. Note that this way there would be two categories of records in each collection: No matter which element we have, it can be always referenced by other elements in other tables. The Hashtable key (=ID) and therefore Hashtable lookups will always be valid.

New records go to the database

Finally, there comes a moment of truth. Changes have been made and accumulated (by marking changed elements as modified). All modified elements must then be stored into the database. We will ask each records for its own SQL query.
public abstract class C_Base
{
	...
	virtual public string InsUpdSQLQuery( C_OHashTable hDict )
	{
		return string.Empty;
	}
Because of two categories of records, we will take two paths:
public class CS_City : C_Base
{
	...
	const string SInsert_0 = "INSERT {0} "+
		"(name) "+
		"VALUES ('{1}') "+
		"SELECT CAST(SCOPE_IDENTITY() AS INT)";
	const string SUpdate_0 = "UPDATE {1} "+
		"SET name='{2}' "+
		"WHERE {3}={0} "+
		"SELECT {0}";

	override public string InsUpdSQLQuery( C_OHashTable hDict )
	{
		if ( this.ID_DB > 0 )
		{
			return String.Format( 
				SUpdate_0, ID_DB, 
				hDict.NameOfTable, 
				name,
				hDict.NameOfKey );				
		}
		else
		{
			return String.Format( 
				SInsert_0, 
				hDict.NameOfTable,
				name
				);  
		}
	}

	public class CU_User : C_Base
	{
		const string SInsert_0 = "INSERT {0} "+
			"(ID_CITY, surname, dateOfBirth) "+
			"VALUES ({1}, '{2}', '{3}') "+
			"SELECT CAST(SCOPE_IDENTITY() AS INT)";
		const string SUpdate_0 = "UPDATE {1} "+
			"SET ID_CITY={2}, surname='{3}', dateOfBirth='{4}' "+
			"WHERE {5}={0} "+
			"SELECT {0}";

		override public string InsUpdSQLQuery( C_OHashTable hDict )
		{
			if ( this.ID_DB > 0 )
			{
				return String.Format( 
					SUpdate_0, ID_DB, 
					hDict.NameOfTable, 
					City.ID_DB,
					surname, dateOfBirth,
					hDict.NameOfKey );				
			}
			else
			{
				return String.Format( 
					SInsert_0, 
					hDict.NameOfTable,
					City.ID_DB,
					surname, dateOfBirth
					);  
			}
		}
Look carefully what is going on here. If the element comes originally from the database, its ID_DB is positive (the field was initialized when record was loaded from the database). Such element produces the UPDATE query. All newly created elements produce INSERT query and the query returns the newly assigned ID back to the application. When the record needs a ID from another table, it asks the corresponding record for its ID_DB (user asks for its city’s ID_DB in the insert-part of InsUpdSQLQuery. These newly assigned ID_DBs are gathered at the client-side after the single insert/update query is sent to the database.
public static void InsUpd_Table_0( C_OHashTable hDict, int key )
{
	SqlCommand    cmd;				
	C_Base        item;

	item = (C_Base)hDict[key];

	cmd    = new SqlCommand( item.InsUpdSQLQuery(hDict), C_PData.SqlConn, C_PData.sqlTran );

	// assign a newly created database ID (or just copy the ID of updated element)
	item.ID_DB         = (int)cmd.ExecuteScalar();
	item.modified      = false;
}

public static void InsUpd_Table_0( C_OHashTable hDict )
{
	C_Base item;
	IEnumerator ie = hDict.Keys.GetEnumerator();
	while ( ie.MoveNext() )
	{
		item = (C_Base)hDict[ie.Current]; 
		if ( item.modified == true )
		{
			InsUpd_Table_0 ( hDict, (int)ie.Current );
		}
	}
}

public static void InsUpd_All()
{
	...
	InsUpd_Table_0 ( C_PData.hS_Cities );
	InsUpd_Table_0 ( C_PData.hU_Users );
}
Note that collections have to be stored in proper order. In particular, cities have to be stored before users, so that server could assign new ID_DBs for newly created cities and then these new ID_DBs could be possibly used when storing newly created users.

3 Local Data Model Generator

What is described above is only a “foundation” that could be further developed to get much smarter model. In fact, what I personally use in practice is far more advanced. Nevertheless, this basic model is something to start working on your own.

To help you with your research, I’ve developed a simple tool called Local Data Model Generator. It allows you to connect to a selected database, examine it and automatically produce all class files that form a local data model for that database according to the above description.

The tool is far from perfection and its usage is limited but it also a great start point for further research.

Local Data Model Generator v.1.0 profile:

4 Appendix

I am not responsible for any damage that would be caused by ideas or software presented above. I am sorry for any grammatical/spelling errors, I am not a native speaker.

Any feedback on this would be appreciated. Feel free to discuss any ideas from or from outside this paper.

5 TODOs

This article misses several important issues. For example, the concurrency issues are not discussed in the practical part. This is to be fixed.