|
Handling of Data in Information Systems 3.3.1 Manual and Automatic methods of Data Entry All computer systems need to have data input to them otherwise they have nothing to process. The methods of collecting the data can be divided into two types: automatic and manual data collection. Automatic Data Collection The most obvious type of automatic data collection is in a control system where the computer collects its view of the outside world from sensors that give information about the physical environment. The data collection done by a sensor is continuous, but the reading of the data is within a fixed time period (the processor does not want to know the temperature in the room all the time, but perhaps every 5 minutes. This gives the previous decision long enough to have had some effect.) The use of only some of the available data is known as sampling. Many sensors that measure physical values are analogue sensors while the data required by the processor needs to be digital. Analogue data is physical data that creates a signal which consists of continuously changing voltage (for example, a thermistor increases the voltage output as the temperature which it is measuring increases). This signal must be changed into the stream of 0s and 1s that the computer can recognise. This is done by an analogue to digital converter. When data is collected off line, often by sensors in remote locations, and then stored until ready for input to the system at a time that is convenient to the system, it is known as data logging. A typical data logger will be in the form of a tape recorder, on which the data is stored until a set of data has been collected and the data can be entered into the system in one go. Obviously, this would not be suitable data input for a system which was controlling the central heating in a house, but a remote weather station on a mountain top where different readings are taken every 10 minutes and then radioed back to the weather centre once every 24 hours would need just such a device to store the data until it was required. Less obvious forms of automatic data collection are barcodes in a supermarket. The code is translated into a series of dark coloured bars on a light background so that the data can be input to the machine without any further preparation. Automatic data collection can be considered to be any data collection that does the two stages of data collection and data input to the system without going through the intermediate phase of data preparation to make it suitable for computer use. Another good example is the school register which is taken by making marks on a sheet of paper and that can then be read directly into the computer with no human intervention by an optical mark reader (OMR). An OMR reads information by translating the position of the mark on the paper into a meaning, so that two marks side by side on the paper mean different things because of where they are rather than what they look like. Other forms of automatically entering data are by voice recognition, which is rather unreliable, but is an attempt by the computer to understand human communication, and by the use of magnetic stripes. These are seen on the back of credit cards and bank cards. The stripe contains information about the owner of the card in a form that the computer can use directly. Another form of data input used by banks is the magnetic ink characters that are printed on the bottom of cheques before being sent to the account holder. The magnetic ink is particularly easy for the computer to read and contains enough information to identify the bank and the account at that bank. All of this can be done with no further human intervention after the original printing of the cheque book. However, the data that is written on the cheque by the customer (who it is made out to and for how much) is not ready for input and hence requires some human intervention to make it useable. Manual Data Entry The most obvious is the form that has been designed to collect data, which needs to be input to the computer. An operator reads the data on the form and then types it into the computer via a keyboard. An extra stage has been added here, the data has had to be typed in, in other words the original data was not in a form acceptable to the computer. Computer systems are available that will read individual characters and input them without the data having to be transcribed, this would count as automatic data collection (it is known as optical character reading (OCR)). Questions on this part of the syllabus will be suggesting suitable input methods for particular situations, and offering advantages and disadvantages for particular forms of data input in different situations. 3.3.2 Methods of Image Capture Scanner A scanner is a device that shines a strong light at a source document and then reads the intensity of the reflected light. The surface of the document is divided into small rectangles, or pixels, and the light intensity is measured of each pixel, it is then reported to the computer as a bit map. Scanners can be of different sizes, typical is an A4 sized flat bed scanner where the document is placed on a sheet of glass which is then scanned line by line, or a hand held scanner which can be rolled across the image a number of times collecting a band each time, these bands of image can then be matched up by the software to produce the complete document. Video Capture Card A video picture is made up of a series of images which are changed approximately 26 times per second in order to fool the brain into thinking that the images are moving. A video capture card is an interface board which fits into one of the expansion slots in a processor that allows the processor to store the values of the screen pixels for a specific picture. In other words it allows the action to be frozen. A typical example of the use of a video capture card is the market stall that uses a video camera to take an image of a customer and then to select one image to print onto the T shirt. Digital Camera Works in a similar way to an optical camera but does not store the image on film. Instead, the image is stored electronically enabling the user to download it into a computer and to manipulate the image and print out the images if desired. Each of these image capture systems results in an electronic image being stored in the computer system. Image manipulation software can then be used to alter or edit the image in any way that is required. While this allows the user to use their imagination and to tidy up pictures or crop them to miss out unwanted parts of the image, it also allows unscrupulous people to produce pictures with very little foundation in reality. It used to be said that: “The camera never lies”, this is certainly no longer true, witness the film Forrest Gump. 3.3.3 Types of Data on Entry Free Text and Structured Data Characters typed into the computer system are known as text. Most text, like the content of this book, is structured by including paragraphing, justification, and many other syntax controlled features to make the content more accessible. Free text data is data without this structuring. This cutting down of the detail within a text makes the volume of data considerably less and hence more suitable for applications where the size of the data is important, like sending emails over the internet. Transaction Data and Data Prepared Off Line Transaction data is the name given to data that has either been collected automatically or prepared manually for data input to an application process. Data prepared off line normally involves the keying in of data to a file so that it is ready for data input to a process. In the process of being entered off line, the data becomes transaction data. The data collected at the point of sale terminal from a barcode would be transaction data while the keying in of data from paper order forms in a mail order catalogue would be preparing data off line. 3.3.4 Validation and Verification When data is input to a computer system, it is only valuable data if it is correct. If the data is in error in any way, then no amount of care in the programming will make up for the erroneous data and the results produced can be expected to be unreliable. There are three types of error that can occur with the data on entry. The first is that the data, while reasonable, is wrong. If your birthday is written down on a data capture form as 18th of November 1983, it will (except in very rare cases) be wrong. It can be typed into the computer with the utmost care as 181183, it can be checked by the computer to make sure that is a sensible date, and will then be accepted as your date of birth despite the fact that it is wrong. There is no reason for the computer to imagine that it may be wrong, quite simply, when you filled out the original form, you made a mistake. The second type of error is when the operator typing in the data hits the wrong key and types in 181193, or the equivalent. In this case an error has been made that should be able to be spotted if a suitable check is made on the input. This type of data checking is called a verification check. The third type of error is when something is typed in which simply is not sensible. If the computer knows that there are only 12 months in a year then it will know that 181383 must be wrong because it is not sensible to be born in the thirteenth month. Checks on the sensibility of the data are called validation checks. Faulty Data There is very little that can be done about faulty data except to let the owner of the data check it visually on a regular basis. The personal information kept on the school administration system about you and your family will be printed off at regular intervals so that your parents can check to ensure that the stored information is still correct. Verification Verification means checking the input data with the original data to make sure that there have been no transcription errors. The standard way to do this is to input the data twice to the computer system. The computer then checks the two sets of data (which should be the same) and if there is a difference between the two sets of data, the computer knows that one of the inputs is wrong. It won’t know which one is wrong but it can now ask the operator to check that particular input. Validation The first thing is to dispel a common misinterpretation of validation. In section 1.6.6 checking of data was mentioned. Specifically, the use of parity bits to check data. This is NOT validation. Parity bits and echoing back are techniques that are used to check that data has been transmitted properly within a computer system (e.g. from the disk drive to the processor), parity checks are used to check the input of data to the system in the first place. Validation is a check on DATA INPUT to the system by comparing the data input with a set of rules that the computer has been told the data must follow. If the data does not match up with the rules, then there must be an error. There are many different types of validation check that can be used to check input in different applications. 1. Range check. A mathematics exam is out of 100. A simple validation rule that the computer can apply to any data that is input is that the mark must be between 0 and 100 inclusive. Consequently, a mark of 101 would be rejected by this check as being outside the acceptable range. 2. Character check. A person’s name will consist of letters of the alphabet and sometimes a hyphen or apostrophe. This rule can be applied to input of a person’s name so that dav2d will immediately be rejected as unacceptable. 3. Format check. A particular application is set up to accept a national insurance number. Each person has a unique national insurance number, but they all have the same format of characters, 2 letters followed by 6 digits followed by a single letter. If the computer knows this rule then it knows what the format of a NI number is and would reject ABC12345Z because it is in the wrong format, it breaks the rule. 4. Length check. A NI number has 9 characters, if more or fewer than 9 characters are keyed in then the data cannot be accurate. 5. Existence check. A bar code is read at a supermarket check out till. The code is sent to the main computer which will search for that code on the stock file. As the stock file contains details of all items held in stock, if it is not there then the item cannot exist, which it obviously does, therefore the code must have been wrongly read. 6. Check digit. When the code is read on the item at the supermarket, it consists of numbers. One number is special, it is called the check digit. If the other numbers have some arithmetic done to them, using a simple algorithm, the answer should be this special digit. When the code is read at the check out till, if the arithmetic does not give the check digit it must have been read wrongly; it is at this point that the beeping sound would normally be heard if everything is alright. For example, suppose the number to be entered is 1374, a check digit may be calculated by giving each digit a weight of 5, 4, 3 and 2 and summing the results. 5 x 1 = 5 4 x 3 = 12 3 x 7 = 21 2 x 4 = 8 the sum is 5 + 12 + 21 + 8 = 45 The next step is to divide this sum by a suitable number and note the remainder. In this case we shall use 7. 45 divided by 7 gives a remainder of 3. Finally, subtract the remainder from the divisor (7 in this case) to produce the check digit. In this case the check digit is 7 – 3 = 4. This would be called a modulus 7 check digit. This means that the number to be entered is 13744. A very common algorithm uses modulus 11 and, if the remainder is 10, X is used as the check digit. A good example of the use of modulus 11 is the ISBNs used on books. 3.3.5 Output Formats When data has been processed by a computer system it is necessary to report the results of the processing. There are a number of different ways that the results can be reported to the user. Graphs Graphs show trends very clearly. Different types of graph can illustrate different characteristics, and when two variables need to be compared, a visual representation can be very useful. However the importance of the scales is paramount because otherwise a very misleading picture can be given. Also, the specific values are not easily read from a graph, indeed, in a continuous distribution, it is simply not possible to take reliable readings to any degree of accuracy. Reports A report is a hard copy printout of the values of variables. This has the advantage of producing the actual figures according to the values specified by the user. However, the figures themselves may need skill to interpret their significance and the value of figures in a vacuum is often hard to justify. Interactive Presentations The previous forms have relied on the format of the report being decided without the luxury of being able to see what the figures look like in the first place. If the system allows the user to decide the type and range of output required during the run, then there is some positive user involvement leading to an interactive presentation where the user can adjust the output to suit the example. Sound Many applications do not lend themselves to a standard, visual, printout. Sound can be used for output from some systems, obvious examples would be voice synthesis for reporting to blind people and an alarm system to protect property against burglars. Video Video is a visually satisfying form of output that takes large amounts of memory to produce because the nature of the medium requires large quantities of pictures to produce the feel of continuous motion. Video is useful for demonstration of techniques where there is little value in pages of instruction if a simple video can illustrate something better. Images Images, or pictures, can be used to enhance understanding. These may be created using graphics packages, may be scanned into the computer or imported from a camera. By using a number of slightly different images, animation can be created. Animations Provide a good stimulus for an audience and lead from one slide to another when making a slide based presentation. Animation takes considerably less processing power than other forms of motion, unless the image being animated is complex. Animation is used so often that it can come across as being a boring technique that has just been added for ‘gloss’. Care needs to be taken when using animation as it can detract from the presentation if it involves complex changes. A simple animation can be created by using just two images. For example, storing two pictures of a Christmas tree, one plain green and the other containing small coloured circles, and switching quickly between them can create the illusion of flashing lights. 3.3.6 Output According to Target Audience Imagine an intensive care ward at a hospital. There are six beds, each with a patient who is being monitored by a computer, the outputs are available for a variety of users. There is a nurse at a desk at one end of the ward. The nurse has other duties, but is expected to make the rounds of the patients to check on their progress at regular intervals. Doctors come round the ward twice a day to check on the patients and make any adjustments to their medication. If a patient is sensed by the computer system to have suffered a relapse while the nurse is sitting at the desk, a sensible output would be sound, some sort of alarm to bring the notice of the nurse to the fact that something is wrong. This may be accompanied by a flashing light, or some other device, to quickly draw attention to the patient needing attention. When the nurse goes around the patients to make a visual check of their conditions, it is not necessary to know exact figures of heart rate or blood sugar, a quick glance at a screen showing a scrolling graph of the state of the patient’s vital signs over the last 20 minutes will be perfectly adequate. If the graph looks in any way abnormal, it may be necessary to get a printout of the actual values of the variables for that patient to determine what action, if any, needs to be taken. The doctor may well want to see a printout of all the variable values for the last twenty four hours, particularly if there is something happening to the patient which is difficult to understand, such historical data can hold the clue to present symptoms. The doctor may change the medication or the parameters within which the patient can be considered to be stable, this will involve the nurse resetting values on the scales of the graphical output, or even resetting the parameters for setting off the audible alarm. This involves the nurse in using an interactive presentation with the system. Once a week the nurse takes a first aid class at the local sixth form college. There are too many students for a one to one presentation all the time so the college computer system has been loaded with demonstration software showing an animation of the technique for artificial respiration. When considering output, always consider the importance of timeliness and relevance. Data tends to have a limited life span, which can be different for the same data in different situations. The data on heart rate from 3 hours ago is not going to be of importance to the nurse looking after the patient, but it may be of great value to the doctor in providing a clue as to the reason for a sudden change in condition. Some data is not relevant to particular situations, however up to date it is. The fact that a patient has blue eyes has no bearing on their physical state and consequently should not be considered relevant to this example, although it may well be in other circumstances. Example Questions. 1. a) State two methods of data entry used by banks in their cheque system. (2) b) Explain why banks find the use of your two examples suitable for this application. (4) A. a)-Keying in of data, either to a disk for later entry, or directly onto the cheque in machine readable form. -MICR b)-Keying in can be used to place the details of the payee onto the cheque together with the amount, -in machine readable form, -or to store the data on a disk for future use. -MICR is already in machine readable form -placed on cheques at time of printing the cheque book -means that complex figures like account number do not have to be keyed in which would invite human error. Notes: Although the answer is given as bullet points, the response expected would be a prose explanation. Note that although MICR is not stated in the syllabus, the examiner expects you to have a working knowledge of it and other input types not stated. The syllabus actually says ‘including…’, in other words any input device may be alluded to in the question, although, in practice, the devices used in the exam will be common and relevant to a simple situation. 2. A small stall is to be opened, as part of a fairground, where the customer can have their likeness printed on to the front of a sweatshirt. Describe two possible methods of capturing the image to be printed. (4) A. -Digital camera -connected to the computer which uses -software to crop and present the image to the printer. -Camcorder where the image is sent to.. -a video capture card which produces a still image on a screen in the same way as a digital camera does. 3. Explain the difference between free text and structured data. (2) A. -Free text data has no structure but simply consists of the characters from the character set. -Structured data is characters from the character set in a framework created by the features available in the software. Notes: There is very little that can be asked, either of this section, because of the simple definitions of the terms, or indeed of the last section which was covered by question 2. There is no need in bullet point 2 of the syllabus to explain the technicalities of how the systems work, so simple description of the techniques and an understanding of possible uses is all that can be expected. 4. A mail order firm receives orders from customers on paper order forms. These are keyed into the computer system by operators. The data that is to be keyed in includes the 5 digit article number, the name of the customer and the date that the order has been received. a) Explain how the data input would be verified. (3) b) Describe three different validation routines that could be performed on the data. (6) A. a)-Two operators would… -independently key in the data -The two copies of the data are then compared.. -by the software.. -and errors are reported to the operators. b)-Article number can have a length check carried out on it.. -if there are not 5 characters then the article number must be wrong. -Name of customer can be checked with a character check.. -any characters other than letters or hyphen or apostrophe must mean that the check has been failed. -The date can be subject to a range check (actually a number of range checks).. -the first two characters must be less than 32. Notes: Do not use the verification technique of printing out the data so that the customer can check it. This is a mail order company and hence this would be impossible. There are many possible alternatives for part (b). Choose them carefully so that there are three different ones and so that all three pieces of data are used. Notice that there are two answers for each one, meaning that the examiner wants to know what the type of check is, and also the rule that has to be followed by the data to be treated as valid. 5. A reaction vessel in a chemical plant is monitored, along with many others, by a computer system using a number of sensors of different types. Describe three different types of output that would be used by such a system, stating why such a use would be necessary. (6) A. -Graphs (of the temperature, pressure..) showing the general state of the reaction vessel.. -to show the operator the trends in the vessel, for example shows clearly whether the temperature is increasing. Report (of temperature).. -while the graph shows a trend, the report gives precise figures. -Sound -an alarm would sound if the temperature went past a safe limit. -Hard copy printout.. -to allow investigators to study problems that may cause a shutdown or unacceptably poor product. Note: While knowledge of a chemical reaction is not part of a computing syllabus, it is reasonable to expect students to realise that heat, or some other sensible parameter, would play an important part in the reaction. 6. Explain what is meant by the timeliness and relevance of data. (2) A. -Timeliness is the concept that data changes over time and that data is only part of a sensible solution for a short period before it becomes outdated. -Relevance of data means that data has a bearing, or use, in that particular application. Note: These are simple definition answers. Having said that, the answers here would not be expected, any answer that has the essence of the correct answer would be acceptable. Examiners are aware that students take these papers under a lot of pressure and in a question like this will do their best to compensate as long as the germ of the answer is there. |