Table: aws_sagemaker_training_job - Query AWS SageMaker Training Jobs using SQL
The AWS SageMaker Training Jobs are part of the Amazon SageMaker service, which provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. Training jobs in SageMaker are tasks that have a start and end time, in which a specified algorithm is used to train a model with provided data. It offers a flexible, end-to-end solution to handle raw data, feature engineering, training, and model deployment.
Table Usage Guide
The aws_sagemaker_training_job
table in Steampipe provides you with information about training jobs within AWS SageMaker. This table allows you, whether you're a data scientist, machine learning engineer, or DevOps engineer, to query job-specific details, including the configuration of the training job, status, performance metrics, and associated metadata. You can utilize this table to monitor the progress of training jobs, verify configuration settings, analyze performance metrics, and more. The schema outlines the various attributes of the training job for you, including the job name, creation time, training time, billable time, and associated tags.
Examples
Basic info
Explore which AWS Sagemaker training jobs are active or inactive, along with their respective creation and last modified times. This can be useful for monitoring job status and understanding the timeline of your machine learning workflows.
select name, arn, training_job_status, creation_time, last_modified_timefrom aws_sagemaker_training_job;
select name, arn, training_job_status, creation_time, last_modified_timefrom aws_sagemaker_training_job;
Get details of associated ML compute instances and storage volumes for each training job
Explore the configuration of your machine learning compute instances and storage volumes for each training job to better understand the resources being utilized. This can be useful for optimizing costs and resources in your AWS SageMaker training jobs.
select name, arn, resource_config ->> 'InstanceType' as instance_type, resource_config ->> 'InstanceCount' as instance_count, resource_config ->> 'VolumeKmsKeyId' as volume_kms_id, resource_config ->> 'VolumeSizeInGB' as volume_sizefrom aws_sagemaker_training_job;
select name, arn, json_extract(resource_config, '$.InstanceType') as instance_type, json_extract(resource_config, '$.InstanceCount') as instance_count, json_extract(resource_config, '$.VolumeKmsKeyId') as volume_kms_id, json_extract(resource_config, '$.VolumeSizeInGB') as volume_sizefrom aws_sagemaker_training_job;
List failed training jobs
Identify instances where training jobs have failed in the AWS SageMaker service. This can be useful in troubleshooting and understanding the reasons for failure, thus enabling effective measures to rectify the issues.
select name, arn, training_job_status, failure_reasonfrom aws_sagemaker_training_jobwhere training_job_status = 'Failed';
select name, arn, training_job_status, failure_reasonfrom aws_sagemaker_training_jobwhere training_job_status = 'Failed';
Control examples
- All Controls > SageMaker > SageMaker training jobs should be enabled with inter-container traffic encryption
- All Controls > SageMaker > SageMaker training jobs should be in VPC
- All Controls > SageMaker > SageMaker training jobs should have network isolation enabled
- All Controls > SageMaker > SageMaker training jobs volumes and outputs should have KMS encryption enabled
Schema for aws_sagemaker_training_job
Name | Type | Operators | Description |
---|---|---|---|
_ctx | jsonb | Steampipe context in JSON form. | |
account_id | text | =, !=, ~~, ~~*, !~~, !~~* | The AWS Account ID in which the resource is located. |
akas | jsonb | Array of globally unique identifier strings (also known as) for the resource. | |
algorithm_specification | jsonb | Information about the algorithm used for training, and algorithm metadata. | |
arn | text | The Amazon Resource Name (ARN) of the training job. | |
auto_ml_job_arn | text | The Amazon Resource Name (ARN) of an AutoML job. | |
billable_time_in_seconds | bigint | The billable time in seconds. Billable time refers to the absolute wall-clock time. | |
checkpoint_config | jsonb | Contains information about the output location for managed spot training checkpoint data. | |
creation_time | timestamp with time zone | >, >=, <, <= | A timestamp that shows when the training job was created. |
debug_hook_config | jsonb | Configuration information for the Debugger hook parameters, metric and tensor collections, and storage paths. | |
debug_rule_configurations | jsonb | Configuration information for Debugger rules for debugging output tensors. | |
debug_rule_evaluation_statuses | jsonb | Evaluation status of Debugger rules for debugging on a training job. | |
enable_infra_check | boolean | Enables an infrastructure health check. | |
enable_inter_container_traffic_encryption | boolean | To encrypt all communications between ML compute instances in distributed training, choose True. | |
enable_managed_spot_training | boolean | A Boolean indicating whether managed spot training is enabled or not. | |
enable_network_isolation | boolean | Specifies enable network isolation for training jobs. | |
enable_remote_debug | boolean | If set to True, enables remote debugging. | |
environment | jsonb | The environment variables to set in the Docker container. | |
experiment_config | jsonb | Associates a SageMaker job as a trial component with an experiment and trial. | |
failure_reason | text | If the training job failed, the reason it failed. | |
final_metric_data_list | jsonb | A collection of MetricData objects that specify the names, values, and dates and times that the training algorithm emitted to Amazon CloudWatch. | |
hyper_parameters | jsonb | Algorithm-specific parameters. | |
input_data_config | jsonb | An array of Channel objects that describes each data input channel. | |
labeling_job_arn | text | The Amazon Resource Name (ARN) of the Amazon SageMaker Ground Truth labeling job that created the transform or training job. | |
last_modified_time | timestamp with time zone | >, >=, <, <= | Timestamp when the training job was last modified. |
maximum_retry_attempts | bigint | The number of times to retry the job. | |
model_artifacts | jsonb | Information about the Amazon S3 location that is configured for storing model artifacts. | |
name | text | = | The name of the training job. |
output_data_config | jsonb | The S3 path where model artifacts that you configured when creating the job are stored. | |
partition | text | The AWS partition in which the resource is located (aws, aws-cn, or aws-us-gov). | |
profiler_config | jsonb | Configuration information for Debugger system monitoring,framework profiling and storage paths. | |
profiler_rule_configurations | jsonb | Configuration information for Debugger rules for profiling system and framework metrics. | |
profiler_rule_evaluation_statuses | jsonb | Evaluation status of Debugger rules for profiling on a training job. | |
profiling_status | text | Profiling status of a training job. | |
region | text | The AWS Region in which the resource is located. | |
resource_config | jsonb | Resources, including ML compute instances and ML storage volumes, that are configured for model training. | |
role_arn | text | The AWS Identity and Access Management (IAM) role configured for the training job. | |
secondary_status | text | Provides detailed information about the state of the training job. | |
secondary_status_transitions | jsonb | A history of all of the secondary statuses that the training job has transitioned through. | |
sp_connection_name | text | =, !=, ~~, ~~*, !~~, !~~* | Steampipe connection name. |
sp_ctx | jsonb | Steampipe context in JSON form. | |
stopping_condition | jsonb | Specifies a limit to how long a model training job can run. | |
tags | jsonb | A map of tags for the resource. | |
tags_src | jsonb | A list of tags assigned to the training job. | |
tensor_board_output_config | jsonb | Configuration of storage locations for the Debugger TensorBoard output data. | |
title | text | Title of the resource. | |
training_end_time | timestamp with time zone | A timestamp that shows when the training job ended. | |
training_job_status | text | >, >=, <, <= | The status of the training job. |
training_start_time | timestamp with time zone | Indicates the time when the training job starts on training instances. | |
training_time_in_seconds | bigint | The training time in seconds. | |
tuning_job_arn | text | The Amazon Resource Name (ARN) of the associated hyperparameter tuning job if the training job was launched by a hyperparameter tuning job. | |
vpc_config | jsonb | A VpcConfig object that specifies the VPC that this training job has access to. | |
warm_pool_status | jsonb | The status of the warm pool associated with the training job. |
Export
This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.
You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh
script:
/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- aws
You can pass the configuration to the command with the --config
argument:
steampipe_export_aws --config '<your_config>' aws_sagemaker_training_job