DynamoDB - Global Secondary Indexes

Applications that require different types of queries with different attributes can use one or more global secondary indexes when executing these detailed queries.
For example − A system that keeps track of users, their login status and the time they logged in. The growth of the previous example slows down queries on your data.
Global secondary indexes speed up queries by organizing the selection of attributes from a table. They use primary keys when sorting data and do not require key table attributes or a key schema that is identical to the table.
All global secondary indexes must include a partition key with the sort key option. The index key schema can be different from the table, and the index key attributes can use any string, numeric, or binary attributes of the top-level table.
You can use other table attributes in a projection, but queries are not retrieved from parent tables.
Attribute Predictions
Projections consist of a set of attributes copied from a table into a secondary index. A projection always occurs with a table partition key and a sort key. In queries, projections give DynamoDB access to any projection attribute; they essentially exist as their own table.
When you create a secondary index, you must specify the attributes for the projection. DynamoDB offers three ways to accomplish this task −
-
KEYS_ONLY - All index entries consist of table partition and sort key values, and index key values. This creates the smallest index.
-
INCLUDE - Includes KEYS_ONLY attributes and non-key attributes.
-
ALL - includes all attributes of the source table, creating the largest possible index.
KEYS_ONLY - All index entries consist of table partition and sort key values, and index key values. This creates the smallest index.
INCLUDE - Includes KEYS_ONLY attributes and non-key attributes.
ALL - includes all attributes of the source table, creating the largest possible index.
Note the trade-offs when projecting attributes into a global secondary index in terms of throughput and storage cost.
Consider the following points −
-
If you only need low latency access to a few attributes, only project the ones you need. This reduces storage and recording costs.
-
If your application frequently accesses certain non-key attributes, design them because storage costs pale in comparison to scan consumption.
-
You can project large sets of commonly used attributes, but this comes at a high storage cost.
-
Use KEYS_ONLY for infrequent table queries and frequent writes/updates. This controls the size but still offers good per-query performance.
If you only need low latency access to a few attributes, only project the ones you need. This reduces storage and recording costs.
If your application frequently accesses certain non-key attributes, design them because storage costs pale in comparison to scan consumption.
You can project large sets of commonly used attributes, but this comes at a high storage cost.
Use KEYS_ONLY for infrequent table queries and frequent writes/updates. This controls the size but still offers good per-query performance.
Global Secondary Index Queries and Scans
You can use queries to access one or more elements in an index. You must specify the table index and name, desired attributes and conditions; with the option to return results in ascending or descending order.
You can also use scans to get all index data. Requires table and index name. You are using a filter expression to retrieve specific data.
Synchronizing table and index data
DynamoDB automatically synchronizes on indexes with their parent table. Each item change operation triggers an asynchronous update, but applications do not write directly to indexes.
You need to understand the impact of DynamoDB maintenance on indexes. When you create an index, you specify key attributes and data types, which means that when written, these data types must match the data types of the key scheme.
When an item is created or deleted, the indexes are eventually updated in a consistent manner, but data updates propagate in fractions of a second (unless some type of system failure occurs). You must account for this latency in applications.
Throughput issues in global secondary indexes - Multiple global secondary indexes affect throughput. Index creation requires capacity unit specifications that exist separately from the table, resulting in operations consuming index capacity units rather than table units.
This can result in throttling if a request or write exceeds the allocated bandwidth. View bandwidth settings with DescribeTable .
Read capacity - Global secondary indexes provide possible consistency. In queries, DynamoDB performs identical provisioning calculations to those used for tables, with the only difference being the use of index element size rather than element size. The return request limit remains 1 MB, which includes the size of the attribute name and value for each returned element.
Recording capacity
When write operations occur, the affected index consumes write units. The write throughput cost is the sum of the write capacity units consumed in writing to the table and the units used in index updates. A successful write operation requires sufficient capacity or results in throttling.
The cost of recording also remains dependent on certain factors, some of which are as follows:
-
New elements that define indexed attributes, or element updates that define undefined indexed attributes, use a single write operation to add the element to the index.
-
Updates that change the value of an indexed key attribute use two entries to remove an element and write a new one.
-
The table entry that initiates the deletion of an indexed attribute uses one entry to erase the projection of the old element in the index.
-
Elements that are not in the index before and after the update operation do not use entries.
-
Updates that change only the value of the predicted attribute in the index key schema, and not the value of the indexed key attribute, use a single entry to update the values ​​of the predicted attributes in the index.
New elements that define indexed attributes, or element updates that define undefined indexed attributes, use a single write operation to add the element to the index.
Updates that change the value of an indexed key attribute use two entries to remove an element and write a new one.
The table entry that initiates the deletion of an indexed attribute uses one entry to erase the projection of the old element in the index.
Elements that are not in the index before and after the update operation do not use entries.
Updates that change only the value of the predicted attribute in the index key schema, and not the value of the indexed key attribute, use a single entry to update the values ​​of the predicted attributes in the index.
All of these factors assume that the item size is less than 1 KB.
Global Secondary Index Store
When an element is written, DynamoDB automatically copies the correct set of attributes to any indexes where the attributes should exist. This affects your account by charging you for the storage of table elements and attributes. The usable space is obtained from the sum of these values:
- Byte size of table primary key
- Size in bytes of the key index attribute
- Size in bytes of projected attributes
- 100 bytes per index element
You can estimate storage requirements by estimating the average element size and multiplying by the number of elements in a table with global secondary index key attributes.
DynamoDB does not write element data for a table element with an undefined attribute defined as an index partition or sort key.
Global Secondary Index Crud
Create a table with global secondary indexes using the CreateTable operation in conjunction with the GlobalSecondaryIndexes parameter . You must specify an attribute to serve as the index key, or use a different one for the index sort key. All index key attributes must be string, numeric, or binary scalars. You also need to specify the bandwidth settings, which consist of ReadCapacityUnits and WriteCapacityUnits .
Use UpdateTable to re-add global secondary indexes to existing tables using the GlobalSecondaryIndexes parameter.
In this operation, you have to provide the following inputs −
- Index
- key scheme
- Predicted Attributes
- Bandwidth settings
Adding a global secondary index for large tables can take a significant amount of time due to item size, predicted attribute size, write capacity, and write activity. Use CloudWatch metrics to monitor the process.
Use DescribeTable to get status information for a global secondary index. Returns one of four IndexStatus for GlobalSecondaryIndexes −
-
CREATION - Indicates the stage of building the index and its unavailability.
-
ACTIVE - Indicates that the index is ready for use.
-
UPDATE - Shows the status of the bandwidth settings update.
-
DELETE - Indicates the deletion status of the index and its permanent unavailability for use.
CREATION - Indicates the stage of building the index and its unavailability.
ACTIVE - Indicates that the index is ready for use.
UPDATE - Shows the status of the bandwidth settings update.
DELETE - Indicates the deletion status of the index and its permanent unavailability for use.
Update the throughput settings of the global secondary index during the load/backfill stage (DynamoDB write attributes to the index and track added/removed/updated items). Use UpdateTable to perform this operation.
You must remember that you cannot add/remove other indexes during the backfill step.
Use UpdateTable to remove global secondary indexes. This allows you to delete only one index per operation, however you can perform multiple operations at the same time, up to five. The delete process does not affect the read/write operations of the parent table, but you cannot add/delete other indexes until the operation completes.
Using Java to work with global secondary indexes
Create a table with an index via CreateTable. Simply create an instance of the DynamoDB class, an instance of the CreateTableRequest class to query for information, and pass the request object to the CreateTable method.
The following program is a brief example −
DynamoDB dynamoDB = new DynamoDB ( new AmazonDynamoDBClient ( new ProfileCredentialsProvider ())); // Attributes ArrayList < AttributeDefinition > attributeDefinitions = new ArrayList < AttributeDefinition >(); attributeDefinitions . add ( new AttributeDefinition () . withAttributeName ( "City" ) . withAttributeType ( "S" )); attributeDefinitions . add ( new AttributeDefinition () . withAttributeName ( "Date" ) . withAttributeType ( "S" )); attributeDefinitions . add ( new AttributeDefinition () . withAttributeName ( "Wind" ) . withAttributeType ( "N" )); // Key schema of the table ArrayList < KeySchemaElement > tableKeySchema = new ArrayList < KeySchemaElement >(); tableKeySchema . add ( new KeySchemaElement () . withAttributeName ( "City" ) . withKeyType ( KeyType . HASH )); //partition key tableKeySchema . add ( new KeySchemaElement () . withAttributeName ( "Date" ) . withKeyType ( KeyType . RANGE )); //Sort key // Wind index GlobalSecondaryIndex windIndex = new GlobalSecondaryIndex () . withIndexName ( "WindIndex" ) . withProvisionedThroughput ( new ProvisionedThroughput () . withReadCapacityUnits (( long ) 10 ) . withWriteCapacityUnits (( long ) 1 )) . withProjection ( new Projection (). withProjectionType ( ProjectionType . ALL )); ArrayList < KeySchemaElement > indexKeySchema = new ArrayList < KeySchemaElement >(); indexKeySchema . add ( new KeySchemaElement () . withAttributeName ( "Date" ) . withKeyType ( KeyType . HASH )); //partition key indexKeySchema . add ( new KeySchemaElement () . withAttributeName ( "Wind" ) . withKeyType ( KeyType . RANGE )); //Sort key windIndex . setKeySchema ( indexKeySchema ); CreateTableRequest createTableRequest = new CreateTableRequest () . withTableName ( "ClimateInfo" ) . withProvisionedThroughput ( new ProvisionedThroughput () . withReadCapacityUnits (( long ) 5 ) . withWriteCapacityUnits (( long ) 1 )) . withAttributeDefinitions ( attributeDefinitions ) . withKeySchema ( tableKeySchema ) . withGlobalSecondaryIndexes ( windIndex ); Table table = dynamoDB . createTable ( createTableRequest ); System . out . println ( table.getDescription ( ) );
Get index information from DescribeTable . First create an instance of the DynamoDB class. Then create an instance of the Table class for the target index. Finally, pass the table to the description method.
Here is a short example −
DynamoDB dynamoDB = new DynamoDB ( new AmazonDynamoDBClient ( new ProfileCredentialsProvider ())); Table table = dynamoDB . getTable ( "ClimateInfo" ); TableDescription tableDesc = table . describe (); Iterator < GlobalSecondaryIndexDescription > gsiIter = tableDesc . getGlobalSecondaryIndexes (). iterator (); while ( gsiIter . hasNext ()) { GlobalSecondaryIndexDescription gsiDesc = gsiIter . next (); System . out . println ( "Index data " + gsiDesc . getIndexName () + ":" ); Iterator < KeySchemaElement > kse7Iter = gsiDesc . getKeySchema (). iterator (); while ( kseIter . hasNext ()) { KeySchemaElement kse = kseIter . next (); System . out . printf ( "\t%s: %s\n" , kse . getAttributeName (), kse . getKeyType ()); } Projection projection = gsiDesc . getProjection (); System . out . println ( "\tProjection type: " + projection . getProjectionType ()); if ( projection . getProjectionType (). toString (). equals ( "INCLUDE" )) { System . out . println ( "\t\tNon-key projected attributes: " + projection . getNonKeyAttributes ()); } }
Use Query to query an index, as you would with a table query. Just create an instance of the DynamoDB class, an instance of the Table class for the target index, an instance of the Index class for the specific index, and pass the index and query object to the query method.
Take a look at the following code to better understand −
DynamoDB dynamoDB = new DynamoDB ( new AmazonDynamoDBClient ( new ProfileCredentialsProvider ())); Table table = dynamoDB . getTable ( "ClimateInfo" ); Index index = table . getIndex ( "WindIndex" ); QuerySpec spec = new QuerySpec () . withKeyConditionExpression ( "#d = :v_date and Wind = :v_wind" ) . withNameMap ( new NameMap () . with ( "#d" , "Date" )) . withValueMap ( new ValueMap () . withString ( ":v_date" , "2016-05-15" ) . withNumber ( ":v_wind" , 0 )); ItemCollection < QueryOutcome > items = index . query ( spec ); Iterator < Item > iter = items . iterator (); while ( iter . hasNext ()) { System . out . println ( iter.next ( ). toJSONPretty ( )); }
The following program is a great example for better understanding −
Note. The following program can use a previously created data source. Before attempting to execute, acquire the supporting libraries and create the necessary data sources (tables with the required characteristics or other referenced sources).
This example also uses the Eclipse IDE, the AWS credential file, and the AWS toolkit in the Eclipse AWS Java project.