Spatial data bases

June, 12, 2012 • Posted by

lthough the two terms, data and information, are often used indiscriminately, they both have a specific meaning. Data can be described as different observations, which are collected and stored. Information is that data, which is useful in answering queries or solving a problem. Digitizing a large number of maps provides a large amount of data after hours of painstaking works, but the data can only render useful information if it is used in analysis. 

Spatial and Non-spatial data 
Geographic data are organised in a geographic database. This database can be considered as a collection of spatially referenced data that acts as a model of reality. There are two important components of this geographic database: its geographic position and its attributes or properties. In other words, spatial data (where is it?) and attribute data (what is it?) 

Attribute Data
The attributes refer to the properties of spatial entities. They are often referred to as non-spatial data since they do not in themselves represent location information. 

  • District Name Area Population 
  • Noida 395 sq. Km. 6,75,341
  • Ghaziabad 385 sq. Km. 2,57,086
  • Mirzapur 119 sq. Km. 1,72,952

Spatial data 
Geographic position refers to the fact that each feature has a location that must be specified in a unique way. To specify the position in an absolute way a coordinate system is used. For small areas, the simplest coordinate system is the regular square grid. For larger areas, certain approved cartographic projections are commonly used. Internationally there are many different coordinate systems in use. 
Geographic object can be shown by FOUR type of representation viz., points, lines, areas, and continuous surfaces. 
Point Data: Points are the simplest type of spatial data. They are-zero dimensional objects with only a position in space but no length. 
Line Data: Lines (also termed segments or arcs) are one-dimensional spatial objects. Besides having a position in space, they also have a length. 
Area Data: Areas (also termed polygons) are two-dimensional spatial objects with not only a position in space and a length but also a width (in other words they have an area). 
Continuous Surface 
Continuous surfaces are three-dimensional spatial objects with not only a position in space, a length and a width, but also a depth or height (in other words they have a volume). These spatial objects have not been discussed further because most GIS do not include real volumetric spatial data. 

Geographic Data — Linkages and Matching :Linkages

A GIS typically links different sets. Suppose you want to know the mortality rate to cancer among children under 10 years of age in each country. If you have one file that contains the number of children in this age group, and another that contains the mortality rate from cancer, you must first combine or link the two data files. Once this is done, you can divide one figure by the other to obtain the desired answer. 

Exact Matching

Exact matching occurs when you have information in one computer file about many geographic features (e.g., towns) and additional information in another file about the same set of features. The operation to bring them together is easily achieved by using a key common to both files — in this case, the town name. Thus, the record in each file with the same town name is extracted, and the two are joined and stored in another file. 

Hierarchical Matching 

Some types of information, however, are collected in more detail and less frequently than other types of information. For example, financial and unemployment data covering a large area are collected quite frequently. On the other hand, population data are collected in small areas but at less frequent intervals. If the smaller areas nest (i.e., fit exactly) within the larger ones, then the way to make the data match of the same area is to use hierarchical matching — add the data for the small areas together until the grouped areas match the bigger ones and then match them exactly. 
The hierarchical structure illustrated in the chart shows that this city is composed of several tracts. To obtain meaningful values for the city, the tract values must be added together. 

Fuzzy Matching 

On many occasions, the boundaries of the smaller areas do not match those of the larger ones. This occurs often while dealing with environmental data. For example, crop boundaries, usually defined by field edges, rarely match the boundaries between the soil types. If you want to determine the most productive soil for a particular crop, you need to overlay the two sets and compute crop productivity for each and every soil type. In principle, this is like laying one map over another and noting the combinations of soil and productivity. 
A GIS can carry out all these operations because it uses geography, as a common key between the data sets. Information is linked only if it relates to the same geographical area. 
Why is data linkage so important? Consider a situation where you have two data sets for a given area, such as yearly income by county and average cost of housing for the same area. Each data might be analysed and/or mapped individually. Alternatively, they may be combined. With two data sets, only one valid combination exists. Even if your data sets may be meaningful for a single query you will still be able to answer many more questions than if the data sets were kept separate. By bringing them together, you add value to the database. To do this, you need GIS. 

Principal Functions of GIS
Data Capture
Data used in GIS often come from many types, and are stored in different ways. A GIS provides tools and a method for the integration of different data into a format to be compared and analysed. Data sources are mainly obtained from manual digitization and scanning of aerial photographs, paper maps, and existing digital data sets. Remote-sensing satellite imagery and GPS are promising data input sources for GIS. 
Database Management and Update 
After data are collected and integrated, the GIS must provide facilities, which can store and maintain data. Effective data management has many definitions but should include all of the following aspects: data security, data integrity, data storage and retrieval, and data maintenance abilities. 

Geographic Analysis 
Data integration and conversion are only a part of the input phase of GIS. What is required next is the ability to interpret and to analyze the collected information quantitatively and qualitatively. For example, satellite image can assist an agricultural scientist to project crop yield per hectare for a particular region. For the same region, the scientist also has the rainfall data for the past six months collected through weather station observations. The scientists also have a map of the soils for the region which shows fertility and suitability for agriculture. These point data can be interpolated and what you get is a thematic map showing isohyets or contour lines of rainfall.

Presenting Results 
One of the most exciting aspects of GIS technology is the variety of different ways in which the information can be presented once it has been processed by GIS. Traditional methods of tabulating and graphing data can be supplemented by maps and three dimensional images. Visual communication is one of the most fascinating aspects of GIS technology and is available in a diverse range of output options. 

Data Capture an Introduction 
The functionality of GIS relies on the quality of data available, which, in most developing countries, is either redundant or inaccurate. Although GIS are being used widely, effective and efficient means of data collection have yet to be systematically established. The true value of GIS can only be realized if the proper tools to collect spatial data and integrate them with attribute data are available. 

Manual Digitization
Manual Digitizing still is the most common method for entering maps into GIS. The map to be digitized is affixed to a digitizing table, and a pointing device (called the digitizing cursor or mouse) is used to trace the features of the map. These features can be boundary lines between mapping units, other linear features (rivers, roads, etc.) or point features (sampling points, rainfall stations, etc.) The digitizing table electronically encodes the position of the cursor with the precision of a fraction of a millimeter. The most common digitizing table uses a fine grid of wires, embedded in the table. The vertical wires will record the Y-coordinates, and the horizontal ones, the X-coordinates. 

The range of digitized coordinates depends upon the density of the wires (called digitizing resolution) and the settings of the digitizing software. A digitizing table is normally a rectangular area in the middle, separated from the outer boundary of the table by a small rim. Outside of this so-called active area of the digitizing table, no coordinates are recorded. The lower left corner of the active area will have the coordinates x = 0 and y = 0. Therefore, make sure that the (part of the) map that you want to digitize is always fixed within the active area. 

Scanning System 
The second method of obtaining vector data is with the use of scanners. Scanning (or scan digitizing) provides a quicker means of data entry than manual digitizing. In scanning, a digital image of the map is produced by moving an electronic detector across the map surface. The output of a scanner is a digital raster image, consisting of a large number of individual cells ordered in rows and columns. For the Conversion to vector format, two types of raster image can be used. 
In the case of Chloropleth maps or thematic maps, such as geological maps, the individual mapping units can be separated by the scanner according to their different colours or grey tones. The resulting images will be in colours or grey tone images. 
In the case of scanned line maps, such as topographic maps, the result is a black-and-white image. Black lines are converted to a value of 1, and the white areas in between lines will obtain a value of 0 in the scanned image. These images, with only two possibilities (1 or 0) are also called binary images. 
The raster image is processed by a computer to improve the image quality and is then edited and checked by an operator. It is then converted into vector format by special computer programmes, which are different for colour/grey tone images and binary images. 
Scanning works best with maps that are very clean, simple, relate to one feature only, and do not contain extraneous information, such as text or graphic symbols. For example, a contour map should only contain the contour line, without height indication, drainage network, or infrastructure. In most cases, such maps will not be available, and should be drawn especially for the purpose of scanning. Scanning and conversion to vector is therefore, only beneficial in large organizations, where a large number of complex maps are entered. In most cases, however, manual digitizing will be the only useful method for entering spatial data in vector format. 

Data Conversion 
While manipulating and analyzing data, the same format should be used for all data. This Scanning System implies that, when different layers are to be used simultaneously, they should all be in vector or all in raster format. Usually the conversion is from vector to raster, because the biggest part of the analysis is done in the raster domain. Vector data are transformed to raster data by overlaying a grid with a user-defined cell size. 

Sometimes the data in the raster format are converted into vector format. This is the case especially if one wants to achieve data reduction because the data storage needed for raster data is much larger than for vector data. 

A digital data file with spatial and attribute data might already exist in some way or another. There might be a national database or specific databases from ministries, projects, or companies. In some cases a conversion is necessary before these data can be downloaded into the desired database. 

The commonly used attribute databases are dBase and Oracle. Sometimes spreadsheet programmes like Lotus, Quattro, or Excel are used, although these cannot be regarded as real database softwares. 

Remote-sensing images are digital datasets recorded by satellite operating agencies and stored in their own image database. They usually have to be converted into the format of the spatial (raster) database before they can be downloaded