DATABASE ARCHITECTURE AND APPLICATION DEVELOPMENT

BSM Development not only has years of experience building database applications on top of a number of commercial databases but we have participated directly in the design and construction of a high-performnce, commercial database engine (IDMS/R), as well as an open source database engine (Typhoon). This gives us an excellent perspective when it comes to choosing a database engine to meet your needs, architecting a database to efficiently hold your data, and building applications that take full advantage of the data in your database.

Choosing a Database

The requirements for building a database engine, much like those for building a file system, are quite simple. One puts some data in; sometime later, one asks for and receives the same data back. Only when reality intrudes in the form of fifty million records do things get complicated — one would like to have the answer back quickly.

Fortunately, the problem has largely been solved by a number of commercial databases with which we at BSM Development are familiar: Oracle; MySQL; Informix; DB2; MS SQLServer. In our experience, all of these databases do a fine job and are easy enough to develop software for. Not sure which one to use? We can help you choose, based on cost, performance requirements, platform, or personal preference. Especially in high-performance situations, our years of experience in actually building database engines, as well as building database applications, will prove valuable to you in choosing a database that offers a solid foundation on which to build.

Deploying a Database

BSMDevelopment can also assist you with database deployment, since we have a firm understanding of the available hardware platforms, from multi-processor systems, to storage arrays, to virtualization solutions. We also have intimate knowledge of the various operating system choices available (such as Windows, Linux, Unix and other proprietary systems) to run your database including optimal installation and configuration parameters.

Couple this with our familiarity with various application development environments and BSM Development can assist you with a complete solution for your database/application needs. Even if you want to run your application on multiple platforms, we can offer you a solution because we have successfully developed single-source-code applications that ran transparently on multiple databases (e.g. MS SQLServer and MySQL) and on multiple operating systems (e.g. Windows and Linux). We can assist you to do likewise or help you to port an existing application to a new or additional platform.

Database Architecture/Design

The pressure is on. The database architect must design a database schema that will allow your application to support all of the features that you need. The only problem is that, until your users actually start using the application, it is very hard to predict what data really needs to be there. This can lead to a poorly-designed database schema which can seriously impact performance.

The trick is to create a schema that contains all of the currently-needed data, arranges it logically in groups of items that are frequently accessed together, indexes it for good performance or so that it can be quickly joined to answer the most frequent queries, and still allows for future data to be added with ease. This is where BSM Development's experience can pay off in providing you with a solid database design.

We can assist you with determining what data needs to be collected, identifying logical groupings and analyzing data relationships for optimal schema design. If you wish, we can deliver an overall schema design, of the big-picture type, that shows all data relationships, using our CAD software (e.g. sample database schema wall-hanging). Or, when it comes time to build your database, we can deliver self-documenting table-constructors that can be used to create the actual database tables (as often as needed) which can be checked in to your source code control system for posterity (e.g. typical MS SQLServer table-constructor).

Application Design

BSM Development can assist you with your initial database application design or we can build your entire application from start to finish, as required. We employ a number of time-tested techniques in the construction of database applications that will be of benefit to your projects. Using these methods results in a more robust and solid product that will stand the test of time and ensure that the data in your database is always useful to you and your users.

Record Initializers

In order to allow the flexibility to add fields to database tables at any time in the future and to accommodate growth or unanticipated needs, we always try to constrain record initialization and setup to single modules that can be called by any code that needs a new record. Along with the intelligent choice of default field values in the table-constructors, this more than anything allows new data to be added easily. Only one module, other than those that actually use the new data, need ever be altered when changes are made.

Data Editing

One of the really unfortunate things about databases is that the data in them is put in there by humans. The database architect spends months identifying all of the data, its relationships, etc., and builds a set of tables that will hold everything in perfect order. It is a thing of beauty. And then, the users come along with their data and muck it all up.

We believe that a database is only as good as the quality of its data and, in this respect, one must be ever vigilant. The first line of defense is the data editing routine. Our experience with actual user data allows us to help you build routines to defend against bad data. We at BSM Development have seen how the users mangle street addresses and we know how to correct them on-the-fly. We understand that the users love duplicate records and know how to detect and prevent them. We know about providing real-time spell checking (e.g. through Ajax on Web applications) so that prose is understandable. And, we are very familiar with numerous data normalization techniques that can be applied to the likes of zip codes, phone numbers, time and date stamps, to ensure consistent data. Data editing isn't the only answer but, with our help, it will be a great first step for your applications.

Data Verification

Apart from data editing techniques, additional data verification methods can be employed as the second tier in your defense against bad data. We make the distinction between data editing and data verification when the checking requires a second level of lookup, for example, verification of a postal code in an external database. BSM Development can show you how to build tables that assist in verification, how to utilize external databases that are readily available elsewhere (e.g. USGS geo-data on the Internet) and how to cache or access this data to provide better performance.

Data Integrity

One doesn't always have the luxury to ensure data integrity in real time. Often performance constraints preclude it. Fortunately, in the case of many databases, there are lots of hours during the day when the humans are eating, sleeping, or getting their fill of situation comedies, when the database applications can be quietly healing the data. Our philosophy is to define sets of criteria that good, well-behaved data must meet and then to create automatic background tasks that check these criteria and perform repairs when need be. Such data integrity tasks, as we at BSM Development call them, can carry out detailed data verification and perform referential checks that are not possible in real time (e.g. duplicate checking is an O(n**2) operation), during off-peak times when cycles are available.

For a typical duplicate record integrity check, we often apply the concept of record scoring. To begin with, we might determine whether records duplicate one another and are candidates for deletion using name similarity scores which can be calculated with algorithms such as: exact match; similarity; Levenshtein; and Soundex or Metaphone. Any or all of these algorithms can be used, with their results being added together after applying appropriate weights to arrive at a similarity score. Those records that are close to one another in scores can be deemed duplicates. If automatic deletion is to be done, deletion scores can be calculated based on record completeness, data accuracy, etc. Often duplicate records result from abortive attempts to create new records, in which case record completeness is a good indicator of deletability. Record timestamps are another good indicator of which record to keep and which to delete. Our experience in building database applications gives us valuable insight into what types of integrity checks make sense and will work for your application.

We believe that data integrity checking is an important means of maintaining pristine data, a means that is all too often overlooked when database applications are built. We can show you how to build these checks into your systems so that your data's health is ensured. The results can be well worth the effort and essentially come for free. As a wise man once said, "All computers wait at the same speed." We might as well put them to work.

Automatic Repair

In the previous paragraphs we touched on the subject of integrity checking database records. For any records that fail integrity checks, we like to automatically repair them, if possible. If errors are frequent, automatic repair is certainly the best option, since the drudgery of constantly fixing up errors will soon annoy the humans. From our experience, BSM Development can help you to identify the types of repairs that are possible and automate them. Failed referential links can often be rebuilt from intrinsic information, for example. We can also identify areas of improvement and assist you to retrofit changes to eliminate common sources of errors. The process of improving database health is an iterative one with which we are quite familiar.

Automatic repair tasks should be able to run unattended at all times. We believe that they should be scheduled to run at frequent intervals (e.g. daily). If interruptions occur, the repair task should take up where it last left off with no human intervention. For extremely large databases, we can help you to implement incremental integrity checks and automatic repairs that are spread over multiple passes.

Incidentally, while we're on the subject of integrity checking and data repair, we at BSM Development believe that these checks can serve a dual purpose. If the various types of past data discrepancies and anomalies, along with the effects of any application bugs and their fixes are codified in the data integrity checks on an ongoing basis, these checks can become a sort of validation suite that can be applied against the database whenever anything out of the ordinary is done to it (e.g. backup/restore). We can show you the modular integrity check approach that we use which easily accommodates new tables, additional checks, etc., as needed.

Notification

Although our goal is to always recover from database problems automatically when they are found by any of the integrity checks, things don't always go according to plan. If something does go awry, we believe that the appropriate support people should be notified through regular channels. This being the case, we can show you how to send email messages, text messages or tweets, or all three, depending on your preferences, to your support personnel in the event of problems with the automatic data repair tasks that run against your database.

In the case of email messages, we prefer to include full instructions about how the problem should be repaired, right in the email message, so that the repairs can begin immediately. As an alternative, we can even help you to implement voicemail messages that can be forwarded through the regular telephone system or via VOIP, using a text-to-speech converter, such as Festival, and a programmable telephone switch, such as Asterisk.

If you'd like BSM Development to assist you with your database design or in building your database applications, please contact BSM Development and describe how you think we can help you, along with your contact information. We can handle the entire project for you — from choosing a database and architecting a database solution through completion and delivery of a finished application.