Program Design Section Two - Fundamental Concepts

Dave Higgins Consulting
Strategic Technology Consulting and Enterprise Architecture

Section 2: Fundamental Concepts

Introduction
To be useful, a method must be based on only a few fundamental principals which guide the development effort. DSPD has three such principles.

To be done correctly, program design must be:

Output Oriented
Logical before Physical, and
Data Structured

Each of these three concepts is explained in detail in the sections which follow.

Output Oriented Design
The phrase "output oriented" has several implications. First and perhaps most importantly, it means that if you do not know what the output is, then you should not begin to write a program. This stands to reason: how do you start to write a program if you don't know what it is supposed to produce? Therefore, the DSPD method places a heavy emphasis on studying the output requirements of a program and understanding those requirements thoroughly before any other "programming" tasks are done.

Output oriented also means that when done correctly, the design of the program and the program itself will primarily "explain how the output is created." This can be quite different from traditional programs which primarily "explain how the input is consumed."

Contrary to popular belief the reason for the existence of programs is to produce outputs, not to consume inputs; there has never been a useful program written which did not produce an output of some kind. The fact that inputs must be consumed in order to produce an output is viewed in DSPD as a secondary concern which will be relegated to the "background" in the design and code, rather than the "foreground."

Logical vs. Physical Design
Saying that program design must be "logical" before "physical" is a way of saying that the programmer should understand "what" needs to be done before worrying about "how" to do it using some programming language.

Consider: At a high level, the computer that is used (or even if a computer is used) and the language that is used to produce an output is irrelevant. Understanding the data that must be output, the data that must be input, and the computation and derivation rules must be achieved regardless of the hardware environment. These are called the "logical" requirements or logical design elements, and are necessarily more important than computer considerations (in other words, it doesn't matter how quickly a program runs or how beautifully it is structured if it produces the wrong results). The characteristics of the specific input and output devices used, the file access methods used, the programming languages used, and so on, are called the "physical" requirements or the physical design elements.

In DSPD we go to great pains to complete a logical design before adding the physical design elements. We also go to great lengths to keep the logical and physical portions of the design as separate as we can. This not only makes the program simpler, but greatly increases its maintainability.

Data Structured Design
One of the fundamental observations credited to Warnier is the nature of the relationship between the structure of data and the structure of a program. He observed that in the best of programs, the structure of the data is the same as the structure of the program. This striking (and remarkably simple) observation is one we exploit to its fullest in DSPD. Instead of trying to program by thinking about processing (what might be done first, what might be done next, and so on) we program based on the structure of the data. Although it may sound odd at first, it works!

This fundamental principle, however, is one best illustrated rather than discussed. Therefore be on the lookout for examples of data structuring as you dig deeper into this tutorial (we'll point them out as we go along, so don't worry). The thing you will probably notice immediately is that the design process tells us to investigate the structure of the data first. When we get around to building the "program," we will build the logic for the program around the structure of that data.

Some Basic Terminology: Sets and Subsets
Since both Warnier's original LCP and DSPD both share a common background in mathematics, it is important to note a couple of terms from the branch known as set theory.

A "set" is collection of group of things that all have something in common. For instance, a group of people who all work for the same company constitutes a set.

A "subset" is simply a set completely contained within another. Thus, the group of all parents who work for the same company is a subset of the set of all people who work for the company. If each member of one set is also a member of a second set, the first set is said to be a subset of the second.

In DSPD, we will mostly be concerned with investigating and manipulating sets of data (files, records, fields, and so on) and sets of processes (programs, subroutines, and so on).

Mappings
Another important concept from set theory is that of a "mapping." A mapping is simply a transformation from one set into another. Thus, a program is a kind of mapping, since it transforms the input set of data into the output set of data. Mappings are classified as being one of four types, distinguished from one another by their relative complexity.

The simplest kind of mapping, a one-to-one mapping, occurs whenever each element in the input set is transformed into one and only one element of the output set. The two other simple mappings, the many-to-one and the one-to-many, are similar in that they transform multiple elements of input into a single element of output, and single elements of input into multiple elements of output, respectively.

The fourth kind of mapping, the many-to-many, is much more complicated than the other three. It defines a relationship where many inputs go into producing many outputs in a complex manner.

Since producing programs which are the equivalent of the three simple mappings is relatively easy to do, the only mapping that presents the programmer with any real challenge is the complex or many-to-many mapping. Fortunately, we can take a lesson from mathematics and avoid many-to-many mappings altogether. Read on...

We know from set theory that any complex mapping can be expressed as a series of simpler mappings. For example, consider the relationships suggested by the diagram below...

If any customer can buy any product, there is a many-to-many relationship between customer and product. However, each customer may have many sales agents assigned, while each agent may sell only one product. So the customer to agent relationship is one-to-many, and the agent to product relationship is many-to-one.

The Basic DSPD Strategy
The idea that complex mappings or programs can always be broken apart into a series of simpler mappings will serve us well in program design. For as we mentioned earlier, a program is a kind of mapping.

Traditionally, programs have been viewed as follows...

For programs which represent simple mappings, this model is fine. But it doesn't help much when programs are more complex. Also, this view of a program tends to support the creation of "monolithic" code--one big "chunk" of code that is difficult to break apart into small sections, and where changing code in one section is likely to have unpredictable effects on other sections far removed.

The DSPD strategy is to break apart a program into three major modules...

The middle mapping, the Logical Output Mapping, is the core of the program. It describes how to produce the output by detailing when calculations should be done, when input should be obtained, and when output should be sent. The trick is that the logical mapping ignores where the data is actually coming from or where it is actually going! When the logical mapping wants data it will ask the physical input mapping to give it "perfect" data: i.e., just the data it needs and in just the order it wants it. And when the logical mapping is ready to output something, it doesn't worry about the format or device, it simply hands the data over to the physical output mapping, which formats it and sends it to the output device.

This strategy incorporates an important concept called "information hiding." Since the physical output mapping "knows" about the format and requirements of the output device (like a screen or a printer), the logical mapping doesn't have to. Hence, the type of output device is "hidden" from the logical mapping. Similarly the physical input mapping knows where and how data is actually stored. The logical mapping does not; it only knows what data it needs (which is not necessarily what is stored). Therefore if the data gets reorganized or changed in any way, the physical input mapping will be updated accordingly, and the logical mapping is totally unaffected.

This property of information hiding is a critical one when considering future modifications. Since the three different parts of the program each deal with different issues, there is no "ripple-effect" when changes occur; each of the three parts may be modified independently of one another. Furthermore, the type of modification to be done can be used to predict the part of the program that will be affected. Changing the output device, for instance, will only impact the physical output mapping. Changing or reorganizing the data base will only affect the physical input mapping.

In addition, this strategy supports reusable code modules and design elements. A physical output mapping that produces a printed report, for instance, can be generic enough to use for many different programs. This allows the program developer to create programs more quickly by using off-the-shelf components.

Previous Section | Next Section

This web site and all material contained herein is Copyright ©2002-2009 Dave Higgins. All Rights Reserved. For additional information, please contact me at:
Dave Higgins · 6215 Parkers Hammock Rd · Naples, FL 34112
239-234-6033 · 239-234-6034 fax · 816-392-4575 cell · dave@davehigginsconsulting.com
or message me on ICQ: 5168581 or AIM: HigginsD01