Local Data Model for .NET Database Application

Wiktor Zychla [wzychla_at_ii.uni.wroc.pl]

Download the Local Data Model Generator source code.

1 Theory

There are many different methodologies of the database programming. In fact, the one universal approach simply does not exist. A professional developer should not be deceived by some common concepts that are usable in typical situations but can be painful in a specific context. Although there are some common clues on database programming, at the beginning we will look at the issue from the wider perspective.

Relational databases store the data in tables. When an application asks for the data, the data that matches the query is sent back to the application. What the developer should think about is “how the application should store the data locally so it can be effectively used”?

Before we actually try to answer the question we will talk about local data models. A local data model could be defined as the way the application stores the data in its internal data structures and the way it manipulates the data. The are 4 elements that form the local data model:

Amount of data that the application should load from the database at one time
Data structures that are used to store the data loaded from the database
Communication strategy according to which the application informs the database about any changes the user makes to the data
Concurrency strategy according to which the application reacts to changes made by many users who work concurrently

Let us assume that the local data model is class-oriented. By class-oriented data model I understand a data model in which there is a strict correspondence between data tables in a database and classes in a object model. For example if the database in class-oriented data model has two tables, T_CUSTOMERS and T_ORDERS then the application’s object model will contain two corresponding classes, C_Customer and C_Order. Following this assumption, let us talk about the local data model.

Amount of data

How much data should the application load from the database? Should it load all the data or should it load only the data that is required to build current view?

The application should load all the data from the database when the database is relatively small. If the estimation reveals that the biggest table will contain no more than, let’s say, one hundred thousand of records and there are no more than several dozen of tables in the database then the application could load all the data at the start time. Having all the data loaded into memory, the application can build all views and reports very quickly without the need to contact the database.
If the estimation reveals that there are more than several dozen thousands of records in many tables then loading all the data into memory at the start time could be difficult because of resource limitations. In such situation, a model should be considered in which the application loads only specific parts of the data each time it contacts the database.

Data structures

What data structures should be used to store the data in memory? The brief answer would be some collections for sure. This leads to the common question: what kind of collection should be used?

The answer is: it is up to you. The plain array is useful because of fast access to consecutive elements. The ArrayList is useful because it can grow and shrink without much overhead. The Hashtable is useful because the data row’s key can be used as the hash key, providing fast access to data that is somehow related inside the database.

What about the structure of classes?

If the data structure should correspond to relations between tables then tree-like structures could be considered. For example, C_Customer class could contain a collection of C_Order as a field. The application itself could then contain a collection of C_Customer objects. Other relations should be modelled according to the object-oriented methodology.
```
class C_PublicData {
	...
	public ArrayList Customers;
}

class C_Order {
	...	
	public int      ID;
	public int      ID_CUSTOMER;
	public DateTime date;
}

class C_Customer {
	...	
	public int       ID;
	public string    name;
	public ArrayList Orders;
}
```
Such structure is really handy when visual components must be filled with data. For example, TreeView control could be initialized very easily.

If the data structure should not correspond to relations between tables but rather to the way the data is stored then the data could be stored in plain collections. The logic of relations between tables should be then modeled properly. Such structure, however, would be simple, coherent and would simplify the communication with the database.

public class C_Data
{
	...
	public static Hashtable C_Customers;
	public static Hashtable C_Orders;
}

public class C_Customer
{
  ...
  public string name;

  public Hashtable C_Orders
  {
     get
     {
       Hashtable hRet = new Hashtable();

       foreach ( C_Order order in C_Data.C_Orders.Values )
       if ( order.ID_CUSTOMER == this.ID )
          hRet.Add( order.ID, order );
	  
       return hRet;
     }
  }
}

public class C_Order
{
	...	
	public int      ID;
	public int      ID_CUSTOMER;
	public DateTime date;

	public C_Customer L_Customer
	{
		get
		{
			return C_Data[ID_CUSTOMER] as C_Customer;
		}
	}
}

Communication strategy

How the application should inform the database server about the changes made to the data by the user?

The application could track changes made by user in consecutive views. For example, when the data is presented in a ListView or a DataGrid then all changes could be stored in an additional structure. If the user would close the view, the application could send the changes back to the database by building a sequence of UPDATE statements based on the gathered information.
The application could track changes made by user directly in the data. For example, all classes could contain a Boolean field, modified. If the user would like to send his changes to the database, the application could examine the data and find all object that had been modified. Then the application could build a sequence of UPDATE statements and send them back to the database.
The application could use some external data structures to track and commit the changes for example the DataSet class.

Concurrency strategy

Should the application inform any concurrent instances about the changes made by the user? Or should it block the concurrent access to the data? There are many possible scenarios and some of them are supported directly by database systems.

The most restricted scenario assumes that the data can be used only by a single user. The concurrent usage is forbidden.
More liberal model assumes that the data can be used concurrently but users can see only separate sets of data in one time.
The former model could be relaxed to allow users to see the same data while allowing only one user to edit the data.
Another model (supported by ADO.NET) allows many users to access the data. When users A and B get the same data, A change the data, saves his changes and then B tries to save his change, an exception is thrown for B saying that the data has just been modified by someone else (A in our case)
The most perfect scenario would assume that all changes are reported to all users at once. This model, although not supported directly, can be implemented in many ways.

Above deliberations are of course not complete because respective parts of the data model can interact in many ways. For example we could imagine an application that loads a part of the data to a tree-like data structures, track changes in respective views only and disallows any concurrency.

How then the data model for a real application should be chosen? Well, it depends on many factors among which we mention:

Costs of implementation - some models are more demanding which usually make the costs higher
Scalability - some models does not scale well and thus are not universal

The final decision should be discussed carefully because a wrong choice could lead to a real disaster, especially when the model must be revised in an advanced stage of the development.

2 Practice

When a project enters the implementation stage it is usually too late for experiments. A stable and tested solution is highly desirable. The model I will present below turned out to be working efficiently in several desktop applications and therefore it will surely be used in future. It is a demonstration of a general rule: the local data model of a database application can be developed from scratch to match specific purposes. The .NET developer is not restricted to the way .NET sees the database world.

Context

At first, let’s define the context we work in. The database consists in several dozen of tables. Estimation reveals that there will be no more than several hundred thousands of records in biggest tables. There will be also several tables containing only several dozens of records. All data will be loaded into memory at once.

The model is then usable for relatively small databases and does not scale well to big databases (millions of records).

Basics of the model

Because of assumptions we’ve taken, the application can load all the data into memory at the start time. The local data model will contain classes that correspond to database tables. For example, tables

CREATE TABLE [dbo].[S_CITIES] (
	[ID] [int] IDENTITY (1, 1) NOT NULL ,
	[name] [varchar] (28)
) ON [PRIMARY]

CREATE TABLE [dbo].[U_USERS] (
	[ID] [int] IDENTITY (1, 1) NOT NULL ,
	[ID_CITY] [int] NOT NULL ,
	[surname] [varchar] (28),
	[dateOfBirth] [datetime] NOT NULL 
) ON [PRIMARY]

will have a corresponding classes

public class CS_City
{
	public int ID;
	public string name;
		
	public override string ToString()
	{
		return this.name;
	}

	public CS_City()	{}
}

public class CU_User 
{
	public int ID;
	public int ID_CITY;
	public string surname;
	public DateTime dateOfBirth;
		
	public override string ToString()
	{
		return this.surname;
	}

	public CU_User()	{}
}

Look at the definitions above. The relations between tables are modelled but not handled directly. Note that ID_CITY is a foreign reference to S_CITIES table. To efficiently handle relations, we will use Hashtables to actually store the data

public class C_PData
{
	public static C_OHashTable hS_Cities =
		new C_OHashTable( typeof(CS_City), "S_CITIES", "ID", "Cities");

	public static C_OHashTable hU_Users =
		new C_OHashTable( typeof(CU_User), "U_USERS", "ID", "Users");

	private C_PData() {}
}

and fast Hashtable lookups to pick up related data

public class CU_User
{
	...
	public CS_City City
	{
		get
		{
			return (CS_City)C_PData.hS_Cities[ID_CITY];
		}
	}
}

(do not worry about C_OHashTable now, the class inherits directly from Hashtable and basically contains some additional information about the collection name, type of elements etc.)

This will work as expected. Having an element of type CU_User, we can ask for corresponding City:

CU_User user;
...
string cname = user.City.name; // get the name of user’s city

Any relations between tables can be handled in similar manner.

Loading the data from database

Note that both classes, CS_City and CU_User, have some common elements. It is then wise to write a base class that serves as the container for all these common elements.

public abstract class C_Base
{
	public int   ID;

	...		

	public override int GetHashCode()
	{
		return ID.GetHashCode();
	}
}

public class CU_User : C_Base { ...
public class CS_City : C_Base { ...

To load the data from the database, we will use one clever routine that will populate consecutive tables, using reflection to get the information about fields. All fields that should be initialized with database data must be previously marked with C_Field custom attribute to distinguish them from other fields that do not come from the database.

public class CU_User 
{
	[C_Field()]
	public int ID_CITY;
	[C_Field()]
	public string surname;
	[C_Field()]
	public DateTime dateOfBirth;
	
	...
}

public const string SSelect0 = "SELECT * FROM {0}";

public static void Init_Table( C_OHashTable hDict, bool bClearTable )
{
	SqlCommand     cmd;
	object         item;
	SqlDataReader r = null;
			
	cmd = new SqlCommand( String.Format( SSelect0, hDict.NameOfTable ), C_PData.SqlConn );			
	r = cmd.ExecuteReader();

	if ( bClearTable==true ) hDict.Clear();

	while ( r.Read() )
	{
		item = Activator.CreateInstance( hDict.ItemType );

		foreach ( FieldInfo fI in hDict.ItemType.GetFields() )
			foreach ( object oA in fI.GetCustomAttributes(true) )
				if ( oA is C_Field )
				{
					hDict.ItemType.InvokeMember( fI.Name, BindingFlags.SetField, null, item, 
							new object[] { r[fI.Name] } );
				}
		C_Base bItem   = (C_Base)item;
		bItem.ID_DB    = bItem.ID;
		bItem.modified = false;				
		hDict.Add ( bItem.ID, item );
	}

Look carefully what is going on here. We read all records from the table (SELECT * FROM …), then we step through consecutive rows and try to pick from the SqlDataReader these columns that correspond to fields marked by C_Field in class definition.

This way the Init_Table method is polymorphic - it only needs a reference to the collection and then it can populate the collection with the data from the database.

public static void InitAllData()
{
	...
	C_PDataInits.Init_Table ( C_PData.hS_Cities, true );  
	C_PDataInits.Init_Table ( C_PData.hU_Users, true );  
	...

Updating the data

Dynamic updating the data is a heart of a database application. The user adds, deletes and modifies the data during his session. Modifications should not destroy any existing data.

In our model, the changes will not be sent to the server immediately. They will be rather accumulated and sent to the server at wish as one big portion of information.

The most interesting and difficult part is the dynamic conversation with the database server. Feedback is necessary because the server assigns IDs to newly created rows. These newly assigned IDs must be at once used at client side to store the other newly created elements. What we need is a clever strategy that would allow to gather the information about new IDs from server and use it at client side.

A newly created ID for element E, returned by the server can be of course propagated to all elements that are in relation with E. This, however, would be extremely inefficient. Imagine that in one session 100 new cities would be added and 1000 new users that come from these new cities. After each city would be stored in the database, its new ID should be propagated to each customer that comes from that city (so that this customer could be stored in the database with proper ID_CITY value). It would either require to loop through the customers collection 100 times (to find customers from each new city) or hold an array of backward references from cites to its customers.

We will take another approach. Instead of updating the newly assigned ID, we will have TWO IDs in each element!

public abstract class C_Base
{
	[C_Field()]
	public int   ID;
	public int   ID_DB;
	public bool  modified;
	...

The first one, ID, will be used as the client ID. I call it client ID because it is used to link related elements at the client-side. The second one, ID_DB, will be dynamically assigned to new records by the database server and will be used in SQL queries by referenced elements.

New records at client side

When the new element would be created at the client side, it would get the new, fresh ID.

public abstract class C_Base
{
	...
	static int   newID = 0;
	public static int NewID 
	{ 
		get
		{
			return newID--; 
		}
	}

Such element would be then added to the proper collection, using this new, fresh ID as the hash key. Note that this way there would be two categories of records in each collection:

old records that come from the database where ID (=ID_DB) is positive and comes from the database
newly created records with negative ID (and unassigned ID_DB)

No matter which element we have, it can be always referenced by other elements in other tables. The Hashtable key (=ID) and therefore Hashtable lookups will always be valid.

New records go to the database

Finally, there comes a moment of truth. Changes have been made and accumulated (by marking changed elements as modified). All modified elements must then be stored into the database. We will ask each records for its own SQL query.

public abstract class C_Base
{
	...
	virtual public string InsUpdSQLQuery( C_OHashTable hDict )
	{
		return string.Empty;
	}

Because of two categories of records, we will take two paths:

old records that are marked for update (modified=true) require a UPDATE query
newly created records (modified=true and ID<0) require an INSERT query and should somehow intercept the newly created database ID

public class CS_City : C_Base
{
	...
	const string SInsert_0 = "INSERT {0} "+
		"(name) "+
		"VALUES ('{1}') "+
		"SELECT CAST(SCOPE_IDENTITY() AS INT)";
	const string SUpdate_0 = "UPDATE {1} "+
		"SET name='{2}' "+
		"WHERE {3}={0} "+
		"SELECT {0}";

	override public string InsUpdSQLQuery( C_OHashTable hDict )
	{
		if ( this.ID_DB > 0 )
		{
			return String.Format( 
				SUpdate_0, ID_DB, 
				hDict.NameOfTable, 
				name,
				hDict.NameOfKey );				
		}
		else
		{
			return String.Format( 
				SInsert_0, 
				hDict.NameOfTable,
				name
				);  
		}
	}

	public class CU_User : C_Base
	{
		const string SInsert_0 = "INSERT {0} "+
			"(ID_CITY, surname, dateOfBirth) "+
			"VALUES ({1}, '{2}', '{3}') "+
			"SELECT CAST(SCOPE_IDENTITY() AS INT)";
		const string SUpdate_0 = "UPDATE {1} "+
			"SET ID_CITY={2}, surname='{3}', dateOfBirth='{4}' "+
			"WHERE {5}={0} "+
			"SELECT {0}";

		override public string InsUpdSQLQuery( C_OHashTable hDict )
		{
			if ( this.ID_DB > 0 )
			{
				return String.Format( 
					SUpdate_0, ID_DB, 
					hDict.NameOfTable, 
					City.ID_DB,
					surname, dateOfBirth,
					hDict.NameOfKey );				
			}
			else
			{
				return String.Format( 
					SInsert_0, 
					hDict.NameOfTable,
					City.ID_DB,
					surname, dateOfBirth
					);  
			}
		}

Look carefully what is going on here. If the element comes originally from the database, its ID_DB is positive (the field was initialized when record was loaded from the database). Such element produces the UPDATE query. All newly created elements produce INSERT query and the query returns the newly assigned ID back to the application. When the record needs a ID from another table, it asks the corresponding record for its ID_DB (user asks for its city’s ID_DB in the insert-part of InsUpdSQLQuery. These newly assigned ID_DBs are gathered at the client-side after the single insert/update query is sent to the database.

public static void InsUpd_Table_0( C_OHashTable hDict, int key )
{
	SqlCommand    cmd;				
	C_Base        item;

	item = (C_Base)hDict[key];

	cmd    = new SqlCommand( item.InsUpdSQLQuery(hDict), C_PData.SqlConn, C_PData.sqlTran );

	// assign a newly created database ID (or just copy the ID of updated element)
	item.ID_DB         = (int)cmd.ExecuteScalar();
	item.modified      = false;
}

public static void InsUpd_Table_0( C_OHashTable hDict )
{
	C_Base item;
	IEnumerator ie = hDict.Keys.GetEnumerator();
	while ( ie.MoveNext() )
	{
		item = (C_Base)hDict[ie.Current]; 
		if ( item.modified == true )
		{
			InsUpd_Table_0 ( hDict, (int)ie.Current );
		}
	}
}

public static void InsUpd_All()
{
	...
	InsUpd_Table_0 ( C_PData.hS_Cities );
	InsUpd_Table_0 ( C_PData.hU_Users );
}

Note that collections have to be stored in proper order. In particular, cities have to be stored before users, so that server could assign new ID_DBs for newly created cities and then these new ID_DBs could be possibly used when storing newly created users.

3 Local Data Model Generator

What is described above is only a “foundation” that could be further developed to get much smarter model. In fact, what I personally use in practice is far more advanced. Nevertheless, this basic model is something to start working on your own.

To help you with your research, I’ve developed a simple tool called Local Data Model Generator. It allows you to connect to a selected database, examine it and automatically produce all class files that form a local data model for that database according to the above description.

The tool is far from perfection and its usage is limited but it also a great start point for further research.

Local Data Model Generator v.1.0 profile:

the application can connect to any database in any available MSSQL server
it examines the database structure, recognizes char, datetime and int table fields and maps them to proper class fields
it produces all classes that form a local data model for selected database, in particular
- it produces class file for each table in a database with complete SQL queries
- it produces all additional classes (C_Base, C_PDataInits, C_PDataStore etc.)
the generated model is absolutely usable - it contains full data layer for a database applications:
- it can load the data from the database
- it can handle inserts and updates in client side and send changes to the database
- it does not handle deletes (it is an easy part)

4 Appendix

I am not responsible for any damage that would be caused by ideas or software presented above. I am sorry for any grammatical/spelling errors, I am not a native speaker.

Any feedback on this would be appreciated. Feel free to discuss any ideas from or from outside this paper.

5 TODOs

This article misses several important issues. For example, the concurrency issues are not discussed in the practical part. This is to be fixed.