A
vector dataset (sometimes called a
feature dataset) stores information about discrete objects, using an encoding of the
vector logical data model to represent the location or
geometry of each object, and an encoding of its other properties that is usually based on
relational database technology. Typically, a single dataset collects information about a set of closely related or similar objects, such as all of the roads in a city. The Vector data model uses
coordinate geometry to represent each shape as one of several
geometric primitives, most commonly
points (a single coordinate of zero
dimension),
lines (a one-dimensional ordered list of coordinates connected by straight lines), and
polygons (a self-closing boundary line enclosing a two-dimensional region). Many data structures have been developed to encode these primitives as digital data, but most modern vector file formats are based on the
Open Geospatial Consortium (OGC)
Simple Features specification, often directly incorporating its
Well-known text (WKT) or Well-known binary (WKB) encodings. In addition to the geometry of each object, a vector dataset must also be able to store its
attributes. For example, a database that describes lakes may contain each lake's depth, water quality, and pollution level. Since the 1970s, almost all vector file formats have adopted the
relational database model, either in principle or directly incorporating
RDBMS software. Thus, the entire dataset is stored in a
table, with each
row representing a single object that contains
columns for each attribute. • A
georelational format stores them as two separate files, with the geometry and attributes of each object being linked by file ordering or a
primary key. This was most common from the 1970s through the early 1990s, because GIS software developers had to invent their own geometry data structures, but incorporated existing relational database file formats for the attributes. For example, the
Esri Shapefile format includes the .dbf file from the DOS
dBase software. • The
Object-based model stores them in a single structure, loosely or directly based on the objects in
object-oriented programming languages. This is the basis of most modern file formats, including
spatial databases that include a geometry column along with the other attributes in a single relational table. Other formats, such as
GeoJSON, use different structures for geometry and attributes, but combine them for each object in the same file.
Geospatial topology is often an important part of vector data, representing the inherent spatial relationships (especially adjacency) between objects. Topology has been managed in vector file formats in four ways. In a
topological data structure, most notably Harvard's POLYVRT and its successor the
ARC/INFO coverage, topological connections between points, lines, and polygons are an inherent part of the encoding of those features. A
topology rulebase is a list of desired topology rules used to enforce spatial integrity in spaghetti data, such as "county polygons must not overlap" and "state polygons must share boundaries with county polygons." Vector datasets usually represent discrete
geographical features, such as buildings, trees, and counties. However, they may also be used to represent
geographical fields by storing locations where the spatially continuous field has been sampled. Sample points (e.g.,
weather stations and
sensor networks),
Contour lines and
triangulated irregular networks (TIN) are used to represent elevation or other values that change continuously over space. TINs record values at point locations, which are connected by lines to form an irregular mesh of triangles. The face of the triangles represent the terrain surface.
Example vector file formats Formats commonly in current usage: •
Shapefile – a popular vector data GIS format, developed by
Esri •
Geography Markup Language (GML) – XML based open standard (by
OpenGIS) for GIS data exchange •
GeoJSON – a lightweight format based on
JSON, used by many open source GIS packages •
GeoMedia –
Intergraph's
Microsoft Access based format for spatial vector storage •
Keyhole Markup Language (KML) – XML based open standard (by
OpenGIS) for GIS data exchange •
MapInfo TAB format –
MapInfo's vector data format using TAB, DAT, ID and MAP files •
Measure Map Pro format –
XML data format to store GIS data •
National Transfer Format (NTF) – National Transfer Format (mostly used by the UK Ordnance Survey) •
Spatialite – a spatial extension to
SQLite, providing vector geodatabase functionality. It is similar to
PostGIS,
Oracle Spatial, and SQL Server with spatial extensions •
Simple Features –
Open Geospatial Consortium specification for vector data •
Well-known text (WKT) – A text markup language for representing feature geometry, developed by
Open Geospatial Consortium •
Well-known binary (WKB) – Binary version of well-known text, used in many
spatial databases •
SOSI – a spatial data format used for all public exchange of spatial data in Norway •
AutoCAD DXF – data transfer format for
AutoCAD data (by
Autodesk) •
Geographic Data Files (GDF) — An interchange file format for geographic data Historical formats seldom used today: •
ArcInfo Coverage - topological data structure used in Arc/INFO from 1981 through 2000 •
Esri TIN – proprietary
binary format for
triangulated irregular network data used by
Esri •
Digital line graph (DLG) – a USGS format for vector data •
TIGER – Topologically Integrated Geographic Encoding and Referencing •
Vector Product Format (VPF) –
National Geospatial-Intelligence Agency (NGA)'s format of vectored data for large geographic databases •
Spatial Data File –
Autodesk's high-performance geodatabase format, native to
MapGuide • ISFC –
Intergraph's
MicroStation based CAD solution attaching vector elements to a relational
Microsoft Access database •
Dual Independent Map Encoding (DIME) – A historic GIS file format, developed in the 1960s ==Advantages and disadvantages==