Handling of data in information systems

Handling of Data in Information Systems

3.3.1 Manual and Automatic methods of Data Entry

All computer systems need to have data input to them otherwise they have nothing to
process. The methods of collecting the data can be divided into two types: automatic and
manual data collection.

Automatic Data Collection
The most obvious type of automatic data collection is in a control system where the
computer collects its view of the outside world from sensors that give information about the
physical environment. The data collection done by a sensor is continuous, but the reading of
the data is within a fixed time period (the processor does not want to know the temperature
in the room all the time, but perhaps every 5 minutes. This gives the previous decision long
enough to have had some effect.) The use of only some of the available data is known as
sampling. Many sensors that measure physical values are analogue sensors while the data
required by the processor needs to be digital. Analogue data is physical data that creates a
signal which consists of continuously changing voltage (for example, a thermistor increases
the voltage output as the temperature which it is measuring increases). This signal must be
changed into the stream of 0s and 1s that the computer can recognise. This is done by an
analogue to digital converter. When data is collected off line, often by sensors in remote
locations, and then stored until ready for input to the system at a time that is convenient to
the system, it is known as data logging. A typical data logger will be in the form of a tape
recorder, on which the data is stored until a set of data has been collected and the data can
be entered into the system in one go. Obviously, this would not be suitable data input for a
system which was controlling the central heating in a house, but a remote weather station
on a mountain top where different readings are taken every 10 minutes and then radioed
back to the weather centre once every 24 hours would need just such a device to store the
data until it was required.
Less obvious forms of automatic data collection are barcodes in a supermarket. The code is
translated into a series of dark coloured bars on a light background so that the data can be
input to the machine without any further preparation.

Automatic data collection can be considered to be any data collection that does the two
stages of data collection and data input to the system without going through the
intermediate phase of data preparation to make it suitable for computer use. Another good
example is the school register which is taken by making marks on a sheet of paper and that
can then be read directly into the computer with no human intervention by an optical mark
reader (OMR). An OMR reads information by translating the position of the mark on the
paper into a meaning, so that two marks side by side on the paper mean different things
because of where they are rather than what they look like.

Other forms of automatically entering data are by voice recognition, which is rather
unreliable, but is an attempt by the computer to understand human communication, and by
the use of magnetic stripes. These are seen on the back of credit cards and bank cards. The
stripe contains information about the owner of the card in a form that the computer can use
directly. Another form of data input used by banks is the magnetic ink characters that are
printed on the bottom of cheques before being sent to the account holder. The magnetic ink
is particularly easy for the computer to read and contains enough information to identify the
bank and the account at that bank. All of this can be done with no further human
intervention after the original printing of the cheque book. However, the data that is written
on the cheque by the customer (who it is made out to and for how much) is not ready for
input and hence requires some human intervention to make it useable.

Manual Data Entry
The most obvious is the form that has been designed to collect data, which needs to be
input to the computer. An operator reads the data on the form and then types it into the
computer via a keyboard. An extra stage has been added here, the data has had to be
typed in, in other words the original data was not in a form acceptable to the computer.
Computer systems are available that will read individual characters and input them without
the data having to be transcribed, this would count as automatic data collection (it is known
as optical character reading (OCR)).

Questions on this part of the syllabus will be suggesting suitable input methods for particular
situations, and offering advantages and disadvantages for particular forms of data input in
different situations.
3.3.2 Methods of Image Capture

Scanner
A scanner is a device that shines a strong light at a source document and then reads the
intensity of the reflected light. The surface of the document is divided into small rectangles,
or pixels, and the light intensity is measured of each pixel, it is then reported to the
computer as a bit map. Scanners can be of different sizes, typical is an A4 sized flat bed
scanner where the document is placed on a sheet of glass which is then scanned line by
line, or a hand held scanner which can be rolled across the image a number of times
collecting a band each time, these bands of image can then be matched up by the software
to produce the complete document.

Video Capture Card
A video picture is made up of a series of images which are changed approximately 26 times
per second in order to fool the brain into thinking that the images are moving. A video
capture card is an interface board which fits into one of the expansion slots in a processor
that allows the processor to store the values of the screen pixels for a specific picture. In
other words it allows the action to be frozen. A typical example of the use of a video capture
card is the market stall that uses a video camera to take an image of a customer and then
to select one image to print onto the T shirt.

Digital Camera
Works in a similar way to an optical camera but does not store the image on film. Instead,
the image is stored electronically enabling the user to download it into a computer and to
manipulate the image and print out the images if desired.

Each of these image capture systems results in an electronic image being stored in the
computer system. Image manipulation software can then be used to alter or edit the image
in any way that is required. While this allows the user to use their imagination and to tidy up
pictures or crop them to miss out unwanted parts of the image, it also allows unscrupulous
people to produce pictures with very little foundation in reality. It used to be said that: “The
camera never lies”, this is certainly no longer true, witness the film Forrest Gump.
3.3.3 Types of Data on Entry

Free Text and Structured Data
Characters typed into the computer system are known as text. Most text, like the content of
this book, is structured by including paragraphing, justification, and many other syntax
controlled features to make the content more accessible. Free text data is data without this
structuring. This cutting down of the detail within a text makes the volume of data
considerably less and hence more suitable for applications where the size of the data is
important, like sending emails over the internet.

Transaction Data and Data Prepared Off Line
Transaction data is the name given to data that has either been collected automatically or
prepared manually for data input to an application process. Data prepared off line normally
involves the keying in of data to a file so that it is ready for data input to a process. In the
process of being entered off line, the data becomes transaction data.
The data collected at the point of sale terminal from a barcode would be transaction data
while the keying in of data from paper order forms in a mail order catalogue would be
preparing data off line.
3.3.4 Validation and Verification

When data is input to a computer system, it is only valuable data if it is correct. If the data
is in error in any way, then no amount of care in the programming will make up for the
erroneous data and the results produced can be expected to be unreliable. There are three
types of error that can occur with the data on entry. The first is that the data, while
reasonable, is wrong. If your birthday is written down on a data capture form as 18th of
November 1983, it will (except in very rare cases) be wrong. It can be typed into the
computer with the utmost care as 181183, it can be checked by the computer to make sure
that is a sensible date, and will then be accepted as your date of birth despite the fact that it
is wrong. There is no reason for the computer to imagine that it may be wrong, quite
simply, when you filled out the original form, you made a mistake. The second type of error
is when the operator typing in the data hits the wrong key and types in 181193, or the
equivalent. In this case an error has been made that should be able to be spotted if a
suitable check is made on the input. This type of data checking is called a verification check.
The third type of error is when something is typed in which simply is not sensible. If the
computer knows that there are only 12 months in a year then it will know that 181383 must
be wrong because it is not sensible to be born in the thirteenth month. Checks on the
sensibility of the data are called validation checks.

Faulty Data
There is very little that can be done about faulty data except to let the owner of the data
check it visually on a regular basis. The personal information kept on the school
administration system about you and your family will be printed off at regular intervals so
that your parents can check to ensure that the stored information is still correct.

Verification
Verification means checking the input data with the original data to make sure that there
have been no transcription errors. The standard way to do this is to input the data twice to
the computer system. The computer then checks the two sets of data (which should be the
same) and if there is a difference between the two sets of data, the computer knows that
one of the inputs is wrong. It won’t know which one is wrong but it can now ask the operator
to check that particular input.

Validation
The first thing is to dispel a common misinterpretation of validation. In section 1.6.6
checking of data was mentioned. Specifically, the use of parity bits to check data. This is
NOT validation. Parity bits and echoing back are techniques that are used to check that data
has been transmitted properly within a computer system (e.g. from the disk drive to the
processor), parity checks are used to check the input of data to the system in the first
place.

Validation is a check on DATA INPUT to the system by comparing the data input with a set
of rules that the computer has been told the data must follow. If the data does not match up
with the rules, then there must be an error. There are many different types of validation
check that can be used to check input in different applications.
1. Range check. A mathematics exam is out of 100. A simple validation rule that the
computer can apply to any data that is input is that the mark must be between 0 and 100
inclusive. Consequently, a mark of 101 would be rejected by this check as being outside the
acceptable range.

2. Character check. A person’s name will consist of letters of the alphabet and sometimes a
hyphen or apostrophe. This rule can be applied to input of a person’s name so that dav2d
will immediately be rejected as unacceptable.

3. Format check. A particular application is set up to accept a national insurance number.
Each person has a unique national insurance number, but they all have the same format of
characters, 2 letters followed by 6 digits followed by a single letter. If the computer knows
this rule then it knows what the format of a NI number is and would reject ABC12345Z
because it is in the wrong format, it breaks the rule.

4. Length check. A NI number has 9 characters, if more or fewer than 9 characters are
keyed in then the data cannot be accurate.

5. Existence check. A bar code is read at a supermarket check out till. The code is sent to
the main computer which will search for that code on the stock file. As the stock file contains
details of all items held in stock, if it is not there then the item cannot exist, which it
obviously does, therefore the code must have been wrongly read.

6. Check digit. When the code is read on the item at the supermarket, it consists of
numbers. One number is special, it is called the check digit. If the other numbers have
some arithmetic done to them, using a simple algorithm, the answer should be this special
digit. When the code is read at the check out till, if the arithmetic does not give the check
digit it must have been read wrongly; it is at this point that the beeping sound would
normally be heard if everything is alright.

For example, suppose the number to be entered is 1374, a check digit may be calculated by
giving each digit a weight of 5, 4, 3 and 2 and summing the results.

5 x 1 = 5
4 x 3 = 12
3 x 7 = 21
2 x 4 = 8 the sum is 5 + 12 + 21 + 8 = 45

The next step is to divide this sum by a suitable number and note the remainder. In this
case we shall use 7. 45 divided by 7 gives a remainder of 3.

Finally, subtract the remainder from the divisor (7 in this case) to produce the check digit.
In this case the check digit is 7 – 3 = 4. This would be called a modulus 7 check digit. This
means that the number to be entered is 13744.

A very common algorithm uses modulus 11 and, if the remainder is 10, X is used as the
check digit. A good example of the use of modulus 11 is the ISBNs used on books.
3.3.5 Output Formats

When data has been processed by a computer system it is necessary to report the results of
the processing. There are a number of different ways that the results can be reported to the
user.

Graphs
Graphs show trends very clearly. Different types of graph can illustrate different
characteristics, and when two variables need to be compared, a visual representation can
be very useful. However the importance of the scales is paramount because otherwise a
very misleading picture can be given. Also, the specific values are not easily read from a
graph, indeed, in a continuous distribution, it is simply not possible to take reliable readings
to any degree of accuracy.

Reports
A report is a hard copy printout of the values of variables. This has the advantage of
producing the actual figures according to the values specified by the user. However, the
figures themselves may need skill to interpret their significance and the value of figures in a
vacuum is often hard to justify.

Interactive Presentations
The previous forms have relied on the format of the report being decided without the luxury
of being able to see what the figures look like in the first place. If the system allows the
user to decide the type and range of output required during the run, then there is some
positive user involvement leading to an interactive presentation where the user can adjust
the output to suit the example.

Sound
Many applications do not lend themselves to a standard, visual, printout. Sound can be used
for output from some systems, obvious examples would be voice synthesis for reporting to
blind people and an alarm system to protect property against burglars.

Video
Video is a visually satisfying form of output that takes large amounts of memory to produce
because the nature of the medium requires large quantities of pictures to produce the feel
of continuous motion. Video is useful for demonstration of techniques where there is little
value in pages of instruction if a simple video can illustrate something better.

Images
Images, or pictures, can be used to enhance understanding. These may be created using
graphics packages, may be scanned into the computer or imported from a camera. By using
a number of slightly different images, animation can be created.

Animations
Provide a good stimulus for an audience and lead from one slide to another when making a
slide based presentation. Animation takes considerably less processing power than other
forms of motion, unless the image being animated is complex. Animation is used so often
that it can come across as being a boring technique that has just been added for ‘gloss’.

Care needs to be taken when using animation as it can detract from the presentation if it
involves complex changes.

A simple animation can be created by using just two images. For example, storing two
pictures of a Christmas tree, one plain green and the other containing small coloured
circles, and switching quickly between them can create the illusion of flashing lights.
3.3.6 Output According to Target Audience

Imagine an intensive care ward at a hospital. There are six beds, each with a patient who is
being monitored by a computer, the outputs are available for a variety of users. There is a
nurse at a desk at one end of the ward. The nurse has other duties, but is expected to make
the rounds of the patients to check on their progress at regular intervals. Doctors come
round the ward twice a day to check on the patients and make any adjustments to their
medication.

If a patient is sensed by the computer system to have suffered a relapse while the nurse is
sitting at the desk, a sensible output would be sound, some sort of alarm to bring the notice
of the nurse to the fact that something is wrong. This may be accompanied by a flashing
light, or some other device, to quickly draw attention to the patient needing attention. When
the nurse goes around the patients to make a visual check of their conditions, it is not
necessary to know exact figures of heart rate or blood sugar, a quick glance at a screen
showing a scrolling graph of the state of the patient’s vital signs over the last 20 minutes will
be perfectly adequate. If the graph looks in any way abnormal, it may be necessary to get
a printout of the actual values of the variables for that patient to determine what action, if
any, needs to be taken. The doctor may well want to see a printout of all the variable values
for the last twenty four hours, particularly if there is something happening to the patient
which is difficult to understand, such historical data can hold the clue to present symptoms.
The doctor may change the medication or the parameters within which the patient can be
considered to be stable, this will involve the nurse resetting values on the scales of the
graphical output, or even resetting the parameters for setting off the audible alarm. This
involves the nurse in using an interactive presentation with the system. Once a week the
nurse takes a first aid class at the local sixth form college. There are too many students for
a one to one presentation all the time so the college computer system has been loaded with
demonstration software showing an animation of the technique for artificial respiration.

When considering output, always consider the importance of timeliness and relevance. Data
tends to have a limited life span, which can be different for the same data in different
situations. The data on heart rate from 3 hours ago is not going to be of importance to the
nurse looking after the patient, but it may be of great value to the doctor in providing a clue
as to the reason for a sudden change in condition. Some data is not relevant to particular
situations, however up to date it is. The fact that a patient has blue eyes has no bearing on
their physical state and consequently should not be considered relevant to this example,
although it may well be in other circumstances.
Example Questions.

1. a) State two methods of data entry used by banks in their cheque system. (2)
b) Explain why banks find the use of your two examples suitable for this application. (4)
A. a)-Keying in of data, either to a disk for later entry, or directly onto the cheque in
machine readable form.
-MICR
b)-Keying in can be used to place the details of the payee onto the cheque together with
the amount,
-in machine readable form,
-or to store the data on a disk for future use.
-MICR is already in machine readable form
-placed on cheques at time of printing the cheque book
-means that complex figures like account number do not have to be keyed in which would
invite human error.

Notes: Although the answer is given as bullet points, the response expected would be a
prose explanation. Note that although MICR is not stated in the syllabus, the examiner
expects you to have a working knowledge of it and other input types not stated. The
syllabus actually says ‘including…’, in other words any input device may be alluded to in the
question, although, in practice, the devices used in the exam will be common and relevant
to a simple situation.

2. A small stall is to be opened, as part of a fairground, where the customer can have their
likeness printed on to the front of a sweatshirt. Describe two possible methods of capturing
the image to be printed. (4)

A. -Digital camera
-connected to the computer which uses
-software to crop and present the image to the printer.
-Camcorder where the image is sent to..
-a video capture card which produces a still image on a screen in the same way as a digital
camera does.

3. Explain the difference between free text and structured data. (2)

A. -Free text data has no structure but simply consists of the characters from the character
set.
-Structured data is characters from the character set in a framework created by the
features available in the software.

Notes: There is very little that can be asked, either of this section, because of the simple
definitions of the terms, or indeed of the last section which was covered by question 2.
There is no need in bullet point 2 of the syllabus to explain the technicalities of how the
systems work, so simple description of the techniques and an understanding of possible
uses is all that can be expected.

4. A mail order firm receives orders from customers on paper order forms. These are
keyed into the computer system by operators. The data that is to be keyed in includes the 5
digit article number, the name of the customer and the date that the order has been
received.

a) Explain how the data input would be verified. (3)
b) Describe three different validation routines that could be performed on the data. (6)

A. a)-Two operators would…
-independently key in the data
-The two copies of the data are then compared..
-by the software..
-and errors are reported to the operators.

b)-Article number can have a length check carried out on it..
-if there are not 5 characters then the article number must be wrong.
-Name of customer can be checked with a character check..
-any characters other than letters or hyphen or apostrophe must mean that the check has
been failed.
-The date can be subject to a range check (actually a number of range checks)..
-the first two characters must be less than 32.

Notes: Do not use the verification technique of printing out the data so that the customer
can check it. This is a mail order company and hence this would be impossible. There are
many possible alternatives for part (b). Choose them carefully so that there are three
different ones and so that all three pieces of data are used. Notice that there are two
answers for each one, meaning that the examiner wants to know what the type of check is,
and also the rule that has to be followed by the data to be treated as valid.

5. A reaction vessel in a chemical plant is monitored, along with many others, by a
computer system using a number of sensors of different types. Describe three different
types of output that would be used by such a system, stating why such a use would be
necessary. (6)

A. -Graphs (of the temperature, pressure..) showing the general state of the reaction
vessel..
-to show the operator the trends in the vessel, for example shows clearly whether the
temperature is increasing.
Report (of temperature)..
-while the graph shows a trend, the report gives precise figures.
-Sound
-an alarm would sound if the temperature went past a safe limit.
-Hard copy printout..
-to allow investigators to study problems that may cause a shutdown or unacceptably poor
product.

Note: While knowledge of a chemical reaction is not part of a computing syllabus, it is
reasonable to expect students to realise that heat, or some other sensible parameter, would
play an important part in the reaction.

6. Explain what is meant by the timeliness and relevance of data. (2)

A. -Timeliness is the concept that data changes over time and that data is only part of a
sensible solution for a short period before it becomes outdated.
-Relevance of data means that data has a bearing, or use, in that particular application.

Note: These are simple definition answers. Having said that, the answers here would not be
expected, any answer that has the essence of the correct answer would be acceptable.
Examiners are aware that students take these papers under a lot of pressure and in a
question like this will do their best to compensate as long as the germ of the answer is
there.