Fri 29 March 2024
Baydari.com

Data Validation

The process of comparing data to a set of rules or values to find out if the data is correct is called data validation. Many programs perform validity checks to ensure the correctness of data. The validity checks are also called validity rules.

For example, when the user enters an email address, it should contain '@' and ‘.’ in it. Otherwise, it is not a valid email address. Validating data enhances its integrity before storing it in the file.

There are many types of validity checks. Some important types are as follows:

 

1. Alphabetic/Numeric Check

Alphabetic check ensures that the user enters only alphabetic data in a field, Numeric check ensures that the user enters only numeric data in a field. For example, the data in the Name field should be alphabetic and the data in the Marks field should be numeric.

 

2. Range Check

The range check determines whether the entered number is within the specified range. For example, if the maximum marks of a subject are 100, a range check on the Marks field ensures that the data entered in the field is between 0 and 100.

 

3. Consistency Check

Consistency check tats the data to ensure that it is logical. For example, the Admission Date field should not contain a date earlier than the current date. Similarly, the value in the Leave Date field should not be earlier than the value in the Admission Date field.

 

4. Completeness Check

Completeness check ensures that the data in a field exists. This check stops the user from leaving a field blank. For example, a completeness check on Registration ID-ensures that the user has entered the registration No. of the student.

 

File Processing System

In the past many organization stored data in files on tape or disk. The data were managed using the file processing system. In a typical file processing system, each department in an organization has its own set of files. The files are designed especially for their own applications. The records in one file are not related to the records in any other file.

Business organizations have used a file-processing system for many years. But this system has many disadvantages.

 

Types of File Organization

Different types of file organizations are as follows:

 

1. Sequential File Organization

In a sequential file organization, records are stored one after the other. The records are normally stored in ascending or descending order. This order is based on a value called a key. Key is a field that contains unique data e.g Registration No or NIC No etc. The value of the key field is different in each record in the file.

The records in a sequential file organization are retrieved sequentially. This type of retrieval is known as sequential access. It means that records are accessed one after the other in the same sequence in which they are stored in the file.

The major disadvantage of sequential access that it is very slow. If the last record is to be retrieved, all preceding record is read before reaching the last record.

 

2. Indexed File Organization

In indexed file organization, records are stored in ascending or descending order. The order is based on a value called key as in a sequential file organization. Additionally, an indexed file organization maintains an index in a file.

An index consists of key values and the corresponding disk address for each record in the file. Index refers to the place on a disk where a record is stored. The index file is updated whenever a record is added or deleted from the file. When the file is processed, the index is retrieved from the disk and copied to the main memory.

The records in an indexed file organization can be accessed sequentially as well as random access or direct access. Direct or random access means that any record can be accessed directly without reading the preceding records. When a record is to be retrieved, its key value is retrieved from the index file. The index also contains the corresponding address of the disk where the record is stored. This address is then used to access the record on the disk. Direct or random access is faster than sequential access.

 

3. Direct File Organization

In direct file organization, the key value of a record is used to determine the location to store the record. Suppose a program establishes a file that has nine locations to store records. If the key in the record is a one-digit value, this value can be used to specify the location to store that record. For example, the record with key 5 can be stored at the relative location. 3 and so on. The relative location is also known as a bucket.

The implementation of direct file organization becomes complex in some situations. Suppose the maximum number of records to be stored is 100 and key for a record four-digit number. The they-four-digit key can give results up to 9999. In this situation, the key cannot be used to specify the relative location. A mathematical formula can be used to find the relative location. This method is known as hashing.

A commonly used hashing technique is called the division/ remainder method. This method uses a prime number that is close to the number of records. It should not be greater, than the number of records. Suppose there are 100 records to be stored. The closest prime number is 97. The key of the record is divided by 97 and the remainder from the division is used as a relative location for storing the record.

 

Disadvantages of File Processing System

The two most important disadvantages of the file processing system are as follows:

 

1. Data Redundancy and Inconsistency

In the file processing system, the same data may be duplicated in several files. For example, there are two files “Students" and "Library”. The file “Students” file contains the Roll No, name, address, and telephone number and other details of all students in a college. The file “Library" contains the Roll No and name of those students who get a book from the library along with the information about the rented books. The data of one student appears in two files. This is known as data redundancy. This redundancy causes higher storage.

This situation can also result in data inconsistency. It means that two files may contain different data of the same student. For example, if the address of a student is changed, it must be changed in both files. There is a possibility that it is changed in the "Students" file and not from the “Library” file. In this case, the data of the student becomes inconsistent.

 

2. Data Isolation

The data in the file processing system is stored in various files. It becomes very difficult to write new application programs to retrieve the appropriate data. Suppose that student emails are stored in the ”Students" file and fee information is stored in the ”Fee” file. To send an email message to inform a student that the date for fee payment is over, you need data from both files. In the file processing system, it is difficult to generate such type of list from multiple files.

 

3. Integrity Problems

Integrity means reliability and accuracy of data. The stored data must satisfy certain types of consistency constraints. For example, Roll No and Marks of the students should be a numeric value. It is very difficult to apply constraints on files using the file processing systems.

 

4. Program Data Dependency

Program data dependency is a relationship between data stored in a file and the specific program required to update and maintain those files. With the file processing system, application programs are developed according to a particular file format. If the format of the underlying file is changed, the application program also needs to be changed accordingly. For example, if there is any change in the length of the postal code, it requires a change in the program. Such changes may be costly to implement.

 

5. Atomicity Problem

When you perform an operation on data, it may consist of different steps. A collection of all steps required to complete a process is known as a transaction. The atomicity means that either one transaction should take place as a whole or it should not take place at all. Suppose you want to transfer money from account A to account B. This process consists of two steps:

1. Deduct the money from account A.

2. Add the money to account B.

Suppose that the system fails when the computer has performed the first step. It means that the amount has been deducted from account A but has not been added to account B. This Situation can make your data inconsistent. The file processing system does not provide the facility to ensure the atomicity of data.

 

6. Security Problems

The file processing system does not provide adequate security for data. In some situations, you may want to provide different types of access permission to data for different users. For example, a data entry operator should only be allowed to enter data. The chairman of the organization should be able to access or delete the data completely. Such types of security options are not available in the file processing system.

 

7. Program Maintenance

The programs developed in the file processing system are difficult to maintain. A lot of budgets are spent on program maintenance. It becomes difficult to develop new programs.


Share