Manage collections and tables
Collections and tables are containers for data within keyspaces in a database.
Whether you use a collection or table depends on your database type, your data’s schema type, and how strictly you want to enforce the schema:
- Collections
-
Collections use dynamic schemas and store data in documents. With a dynamic schema, each document can have different fields. Collections are best for semi-structured data.
You can create collections only in Serverless (Vector) databases.
- Tables
-
Tables use fixed schemas and store data in rows. With a fixed schema, all rows must have the same columns, and every column must have a value, which can be
null
. Tables are best for structured data.You can create tables in Serverless (Non-Vector) and Serverless (Vector) databases.
Prerequisites
-
An Astra DB Serverless database with a keyspace.
-
A role with table permissions, such as the Database Administrator role.
Table permissions apply to both collections and tables.
To create collections and tables with the Data API, you need an application token with an appropriately scoped role.
Consider your data model
While an optimal data model isn’t necessary for small tests, it is important for production applications, including robust development and testing environments for your applications.
Before you create collections and tables for production applications, take time to prepare an effective data model for your application’s needs. Consider the following:
-
Data types and schemas that you want to use.
For example, if you want to enforce the schema, then you must use tables.
-
Which data needs to be indexed.
For example, collections index all fields by default. If you want to apply selective indexing, you must create your collection with the Data API and define the
indexing
clause. -
How you want to query the data.
For example, if you want to use Astra DB’s built-in vector search capabilities, then you must store your data in a Serverless (Vector) database.
Understand vector data settings
Both collections and tables can store vector and non-vector data, and it is a common practice to store vector data alongside non-vector metadata. However, if you want to use Astra DB’s built-in vector search capabilities, then you must store your data in a Serverless (Vector) database. |
For tables, you can create, modify, and drop vector
columns and indexes at any time.
However, for collections, you must configure vector-related settings when you create the collection. This includes the following:
-
Support for vector data, also known as a vector-enabled collection
-
The number of dimensions and the similarity metric for the vectors in your dataset
-
An embedding provider integration, if you ever plan to use one or might use one in the future
-
Support for hybrid search
For vector-enabled collections, you decide how to provide embeddings:
-
Generate embeddings outside Astra, and then load the embeddings when you insert data.
-
Use an embedding provider integration to automatically generate embeddings.
-
Use both options.
For more information about these options, see Auto-generate embeddings with vectorize and $vector and $vectorize in collections.
Data manipulation in multi-region databases
For multi-region databases, the Astra Portal’s Data Explorer accesses and manipulates keyspaces, collections, tables, and data from the primary region. If you need to manage your database from a secondary region, you must use the Data API or CQL. Generally, accessing secondary regions is for latency optimization when the primary region is geographically distant from the caller or when the primary region is experiencing an outage. However, because multi-region databases follow an eventual consistent model, changes to data in any region are eventually replicated to the database’s other regions.
Collections
Collections are only available in Serverless (Vector) databases. For Serverless (Non-Vector) databases, see Create a table.
Create a collection
Collection settings are permanent. If you need to change the settings after creating a collection, you must delete the collection and create a new one with the desired settings. |
You can create a collection in the Astra Portal or with the Data API.
While Serverless (Vector) databases can have both collections and tables, the Astra Portal’s Data Explorer only supports collection creation, and it lists all collections and tables under the Collections label.
To create tables in a Serverless (Vector) database, you must use CQL or the Data API.
-
Astra Portal
-
Data API
-
In the Astra Portal navigation menu, click Databases, and then click the name of your Serverless (Vector) database.
-
Click Data Explorer.
-
In the Keyspace field, select the keyspace where you want to create the collection or use
default_keyspace
. -
Click Create Collection.
-
In the Create collection dialog, enter a name for the collection.
Rules for collection names
-
Can contain letters, numbers, and underscores
-
Must have a length of 2 to 50 characters
-
Must be unique within the keyspace
-
-
Decide whether you want this collection to support vector data:
-
If you want to store vector data in this collection, enable Vector-enabled collection.
-
If you don’t want to store vector data in this collection, disable Vector-enabled collection.
-
-
For vector-enabled collections, select an Embedding generation method:
-
Bring my own: Select this option if you only want to generate your own embeddings and import them when you insert data into your collection. Then, enter the number of dimension for the vectors in your dataset, and select a similarity metric. You can enter custom dimensions or select from common embedding models and dimensions. The available similarity metrics are Cosine, Dot Product, and Euclidean.
-
Use an embedding provider integration: If you want to automatically generate embeddings when you insert data, attach an embedding provider integration to your collection, and then configure the model, dimensions, and similarity metric. Available models and dimensions vary by provider.
For applicable databases, the built-in NVIDIA embedding provider integration is selected by default. Other providers require additional setup before you can use them with a collection. For more information, see Auto-generate embeddings with vectorize.
You cannot attach an embedding provider integration to a collection after you create the collection. If you want to use an embedding provider integration, you must enable it when you create the collection.
You can manually provide embeddings even if the collection has a vectorize integration. However, you must ensure that the manually-provided embeddings have the same dimensions and model as the automatically-generated embeddings.
-
-
Click Create collection.
You can use the Data API to programmatically create a collection.
For more information and examples, see the Data API reference for creating a collection and the documentation for your embedding provider integration.
After you create a collection, insert data into the collection.
Troubleshoot collection creation
- Collection limit reached or TOO_MANY_INDEXES
-
If you get a
Collection Limit Reached
orTOO_MANY_INDEXES
message, you must delete a collection before you can create a new one.Serverless (Vector) databases created after June 24, 2024 can have approximately 10 collections. Databases created before this date can have approximately 5 collections. The collection limit is based on the number of indexes.
- Embedding provider isn’t available when creating a collection
-
There are a few reasons why an embedding provider might not be listed when creating a collection:
-
You already enabled the integration: The Add embedding provider integration option only allows you to configure a new embedding provider integration for the first time. If you have already set up an embedding provider integration in your organization, you must manage it through your organization’s Integrations settings. For example, if you want to add another API key, you must do so in the Integration settings, and then create your collection afterwards.
-
Your database isn’t in the integration’s scope: Embedding provider API keys are scoped to specific databases. If you want to use the same integration in multiple databases, you must add all relevant databases to the integration’s API key’s scope in your organization’s Integration settings.
-
The embedding provider isn’t supported for automatic embedding generation: Astra DB only supports certain embedding providers for automatic embedding generation.
For a full list of supported providers and documentation for each integration, see Auto-generate embeddings with vectorize.
-
- NVIDIA embedding provider isn’t available for a collection
-
The NVIDIA embedding provider integration is only available in specific regions. For more information, see Integrate NVIDIA as an embedding provider.
Delete a collection
Deleting a collection permanently deletes all data in the collection. |
-
Astra Portal
-
Data API
-
In the Astra Portal navigation menu, click Databases, and then click the name of your Serverless (Vector) database.
-
Click Data Explorer.
-
In the Keyspace field, select the keyspace that contains the collection you want to delete.
-
In the Collections section, find the collection you want to delete, click more_vert More, and then click Delete collection.
-
In the Delete collection dialog, enter the collection name, and then click Delete collection.
The collection and all of its data are permanently deleted.
You can use the Data API to programmatically delete a collection in a Serverless (Vector) database.
For more information and examples, see the Data API reference for deleting a collection.
Tables
To create tables in Serverless (Vector) databases, you must use CQL or the Data API.
For Serverless (Non-Vector) databases, you must use CQL.
Create a table
You can create tables in Serverless (Non-Vector) and Serverless (Vector) databases.
-
Astra Portal (cqlsh)
-
Data API
You can use the built-in CQL shell (cqlsh
) in the Astra Portal, the standalone CQL shell, or a driver to manage tables.
For information about CQL shell and drivers, see Cassandra Query Language (CQL) for Astra DB.
To use the CQL shell in the Astra Portal to create a table, do the following:
-
In the Astra Portal navigation menu, click Databases, and then click the name of the database where you want to create a table.
-
Note the name of the keyspace where you want to create the table.
-
Click CQL Console, and then wait for the
token@cqlsh>
prompt to appear. -
Select the keyspace that you want to create the table in:
use KEYSPACE_NAME;
-
Create a table:
CREATE TABLE users ( firstname text, lastname text, email text, "favorite color" text, PRIMARY KEY (firstname, lastname) ) WITH CLUSTERING ORDER BY (lastname ASC);
Rules for table names
-
Can contain letters, numbers, and underscores
-
Must have a length of 2 to 50 characters
-
Must be unique within the keyspace
-
You can use the Data API to programmatically create a table in a Serverless (Vector) database.
For more information and examples, see the Data API reference for creating a table.
After you create a table, insert data into the table.
Delete a table
Deleting a table permanently deletes all data in the table. |
-
Astra Portal (cqlsh)
-
Data API
You can use the built-in CQL shell (cqlsh
) in the Astra Portal, the standalone CQL shell, or a driver to manage tables.
For information about CQL shell and drivers, see Cassandra Query Language (CQL) for Astra DB.
To use the CQL shell in the Astra Portal to delete a table, do the following:
-
In the Astra Portal navigation menu, click Databases, and then click the name of your database.
-
Note the name of the keyspace that contains the table you want to delete.
-
Click CQL Console, and then wait for the
token@cqlsh>
prompt to appear. -
Select the keyspace that contains the table you want to delete:
use KEYSPACE_NAME;
-
Get a list of all tables in the keyspace:
desc tables;
-
Delete the table and all of its data:
drop table TABLE_NAME;
The table and its data are deleted.
You can use the Data API to programmatically delete a table in a Serverless (Vector) database.
For more information and examples, see the Data API reference for deleting a table.