DynamoDB - Scanning

DynamoDB - Scanning

Scan operations read all elements of a table or secondary indexes. Its default function returns all data attributes of all elements in an index or table. Use the ProjectionExpression parameter in attribute filters.

Each scan returns a result set, even if no matches are found, resulting in an empty set. Scans extract no more than 1 MB with the ability to filter data.

Note . Scan options and filtering also apply to requests.

Types of scan operations

Filtration. Scan operations provide fine-grained filtering on filter expressions that change data after scanning or querying; before returning results. Comparison operators are used in expressions. Their syntax is similar to condition expressions, except for the key attributes that filter expressions do not allow. You cannot use a section or sort key in a filter expression.

Note . The 1MB limit applies before any filtering is applied.

Bandwidth Specifications - Scans use bandwidth but focus on element size rather than data returned. The consumption remains the same whether you query every attribute or just a few, and the use or absence of a filter expression does not affect consumption either.

Pagination - DynamoDB paginates the results, causing the results to be split into specific pages. A 1 MB limit applies to returned results, and when you exceed it, another scan becomes necessary to collect the rest of the data. The LastEvaluatedKey value enables this subsequent scan. Just apply the value to ExclusiveStartkey . When the LastEvaluatedKey value becomes zero, the operation has completed all data pages. However, a non-zero value does not automatically mean that more data remains. Only zero value indicates the status.

Limit parameter − The limit parameter controls the size of the result. DynamoDB uses it to determine the number of items to process before returning data, and does not operate outside of scope. If you set the value to x, DynamoDB will return the first x of the matching elements.

The LastEvaluatedKey value also applies in cases of constraint parameters that produce partial results. Use it to complete the scan.

Result Count - Query and scan responses also include information related to ScannedCount and Count which quantifies the scanned/requested items and quantifies the items returned. If you don't filter, their meanings are identical. When you exceed 1MB, the counts represent only the processed portion.

Consistency - Query results and scan results eventually become consistent, however you can also set highly consistent reads. Use the ConsistentRead option to change this setting.

Note. Consistent read settings affect consumption using double capacity units when set to strongly consistent values.

Performance. Queries provide better performance than scans because scans traverse a full table or secondary index, resulting in sluggish response and high bandwidth consumption. Scans are best suited for smaller tables and searches with fewer filters, however you can design a lean scan by following a few best practices such as avoiding sudden accelerated reads and using parallel scans.

The query finds a specific range of keys that meet a given condition, with performance determined by the amount of data retrieved, not the size of the keys. The parameters of the operation and the number of matches have a particular impact on performance.

Parallel Scan

Scan operations perform processing sequentially by default. They then return the data in 1MB chunks, which prompts the application to fetch the next chunk. This results in long scans of large tables and indexes.

This characteristic also means that scanning may not always fully utilize the available bandwidth. DynamoDB distributes table data across multiple partitions; and the scan throughput remains limited for a single partition due to its single partition operation.

The solution to this problem lies in the logical division of tables or indexes into segments. Then the “workers” scan the segments in parallel (simultaneously). It uses the Segment and TotalSegments parameters to specify the segments that are scanned by specific workers and indicate the total number of segments processed.

work number

To achieve maximum application performance, you should experiment with working values ​​(Segment parameter).

Note. Parallel scans with many workers impact throughput, possibly consuming all of the throughput. Solve this problem with the Limit parameter, which you can use to prevent one worker from using all the bandwidth.

Below is an example of a deep scan.

Note. The following program can use a previously created data source. Before attempting to execute, acquire the supporting libraries and create the necessary data sources (tables with the required characteristics or other referenced sources).

This example also uses the Eclipse IDE, the AWS credential file, and the AWS toolkit in the Eclipse AWS Java project.