The central concept of a document-oriented database is the notion of a
document. Although implementations vary in their specific definitions, document-oriented databases generally treat documents as self-contained units that encapsulate and encode data in a standardized format. Common encoding formats include
XML,
YAML,
JSON, as well as binary representations such as
BSON. Documents in a document store are equivalent to the programming concept of an object. They are not required to adhere to a fixed schema, and documents within the same collection may contain different fields or structures. Fields may be optional, and documents of the same logical type may differ in composition. For example, the following illustrates a document encoded in JSON: { "firstName": "Bob", "lastName": "Smith", "address": { "type": "Home", "street1":"5 Oak St.", "city": "Boys", "state": "AR", "zip": "32225", "country": "US" }, "hobby": "sailing", "phone": { "type": "Cell", "number": "(555)-123-4567" } } A second document might be encoded in XML as: Bob Smith (123) 555-0178 (890) 555-0133 Home 123 Back St. Boys AR 32225 US The two example documents share some structural elements but also contain unique fields. The structure, text, and other data within each document are collectively referred to as the document's content and can be accessed or modified using retrieval or editing operations. Unlike relational databases, in which each record contains the same fields and unused fields are left empty, document-oriented databases do not require uniform fields across documents. This design allows new information to be added to some documents without affecting the structure of others. Document databases often support the storage of additional
metadata alongside the document content. Such metadata may relate to organizational features, security, indexing, or other implementation-specific features.
CRUD operations The core operations supported by a document-oriented database for manipulating documents are similar to those in other databases. Although terminology is not perfectly standardized, these operations are generally recognized as Create, Read, Update, and Delete (
CRUD). • Creation (C): Adds a new document to the database. • Retrieval (R): Retrieves documents or fields based on queries. • Update (U): Modifies the contents of existing documents. • Deletion (D): Removes documents from the database.
Keys Documents in a document-oriented database are addressed via a unique
identifier. This identifier, often a
string,
URI, or
path, can be used to retrieve the document from the database. Most document stores maintain an
index on the key to optimize retrieval, and in some implementations the key is required when creating or inserting a new document.
Retrieval In addition to key-based access, document-oriented databases typically provide an API or query language that enables retrieval based on document content or associated metadata. For example, a query may return all documents with a specific field matching a given value. The available query features, indexing options, and performance characteristics vary across implementations. Document stores differ from key-value stores in that they exploit the internal structure and metadata of stored documents. In many key-value stores, values are treated as opaque or "black-box" data, meaning the database system does not interpret their internal structure. By contrast, document-oriented databases can classify and interpret document content. This enables queries that distinguish between types of data––for example, retrieving all phone numbers containing "555" without also matching a postal code such as "55555."
Editing Document databases typically provide mechanisms for updating or editing the content or metadata of a document. Updates may involve replacing the entire document or modifying individual elements or fields within the document.
Organization Document database implementations support a variety of methods for organizing documents, including: •
Collections: Groups of documents. Depending on the implementation, a document may be required to belong to a single collection or may be allowed in multiple collections. •
Tags and non-visible metadata: Additional data stored outside the main document content. •
Directory hierarchies: Documents organized in a tree-like structure, often based on path or URI. These organizational structures may differ between logical and physical representations (e.g. on disk or in memory). == Relationship to other databases ==