steampipe plugin install aws

Table: aws_sagemaker_training_job - Query AWS SageMaker Training Jobs using SQL

The AWS SageMaker Training Jobs are part of the Amazon SageMaker service, which provides developers and data scientists with the ability to build, train, and deploy machine learning (ML) models quickly. Training jobs in SageMaker are tasks that have a start and end time, in which a specified algorithm is used to train a model with provided data. It offers a flexible, end-to-end solution to handle raw data, feature engineering, training, and model deployment.

Table Usage Guide

The aws_sagemaker_training_job table in Steampipe provides you with information about training jobs within AWS SageMaker. This table allows you, whether you're a data scientist, machine learning engineer, or DevOps engineer, to query job-specific details, including the configuration of the training job, status, performance metrics, and associated metadata. You can utilize this table to monitor the progress of training jobs, verify configuration settings, analyze performance metrics, and more. The schema outlines the various attributes of the training job for you, including the job name, creation time, training time, billable time, and associated tags.

Examples

Basic info

Explore which AWS Sagemaker training jobs are active or inactive, along with their respective creation and last modified times. This can be useful for monitoring job status and understanding the timeline of your machine learning workflows.

select
name,
arn,
training_job_status,
creation_time,
last_modified_time
from
aws_sagemaker_training_job;
select
name,
arn,
training_job_status,
creation_time,
last_modified_time
from
aws_sagemaker_training_job;

Get details of associated ML compute instances and storage volumes for each training job

Explore the configuration of your machine learning compute instances and storage volumes for each training job to better understand the resources being utilized. This can be useful for optimizing costs and resources in your AWS SageMaker training jobs.

select
name,
arn,
resource_config ->> 'InstanceType' as instance_type,
resource_config ->> 'InstanceCount' as instance_count,
resource_config ->> 'VolumeKmsKeyId' as volume_kms_id,
resource_config ->> 'VolumeSizeInGB' as volume_size
from
aws_sagemaker_training_job;
select
name,
arn,
json_extract(resource_config, '$.InstanceType') as instance_type,
json_extract(resource_config, '$.InstanceCount') as instance_count,
json_extract(resource_config, '$.VolumeKmsKeyId') as volume_kms_id,
json_extract(resource_config, '$.VolumeSizeInGB') as volume_size
from
aws_sagemaker_training_job;

List failed training jobs

Identify instances where training jobs have failed in the AWS SageMaker service. This can be useful in troubleshooting and understanding the reasons for failure, thus enabling effective measures to rectify the issues.

select
name,
arn,
training_job_status,
failure_reason
from
aws_sagemaker_training_job
where
training_job_status = 'Failed';
select
name,
arn,
training_job_status,
failure_reason
from
aws_sagemaker_training_job
where
training_job_status = 'Failed';

Schema for aws_sagemaker_training_job

NameTypeOperatorsDescription
_ctxjsonbSteampipe context in JSON form.
account_idtext=, !=, ~~, ~~*, !~~, !~~*The AWS Account ID in which the resource is located.
akasjsonbArray of globally unique identifier strings (also known as) for the resource.
algorithm_specificationjsonbInformation about the algorithm used for training, and algorithm metadata.
arntextThe Amazon Resource Name (ARN) of the training job.
auto_ml_job_arntextThe Amazon Resource Name (ARN) of an AutoML job.
billable_time_in_secondsbigintThe billable time in seconds. Billable time refers to the absolute wall-clock time.
checkpoint_configjsonbContains information about the output location for managed spot training checkpoint data.
creation_timetimestamp with time zone>, >=, <, <=A timestamp that shows when the training job was created.
debug_hook_configjsonbConfiguration information for the Debugger hook parameters, metric and tensor collections, and storage paths.
debug_rule_configurationsjsonbConfiguration information for Debugger rules for debugging output tensors.
debug_rule_evaluation_statusesjsonbEvaluation status of Debugger rules for debugging on a training job.
enable_infra_checkbooleanEnables an infrastructure health check.
enable_inter_container_traffic_encryptionbooleanTo encrypt all communications between ML compute instances in distributed training, choose True.
enable_managed_spot_trainingbooleanA Boolean indicating whether managed spot training is enabled or not.
enable_network_isolationbooleanSpecifies enable network isolation for training jobs.
enable_remote_debugbooleanIf set to True, enables remote debugging.
environmentjsonbThe environment variables to set in the Docker container.
experiment_configjsonbAssociates a SageMaker job as a trial component with an experiment and trial.
failure_reasontextIf the training job failed, the reason it failed.
final_metric_data_listjsonbA collection of MetricData objects that specify the names, values, and dates and times that the training algorithm emitted to Amazon CloudWatch.
hyper_parametersjsonbAlgorithm-specific parameters.
input_data_configjsonbAn array of Channel objects that describes each data input channel.
labeling_job_arntextThe Amazon Resource Name (ARN) of the Amazon SageMaker Ground Truth labeling job that created the transform or training job.
last_modified_timetimestamp with time zone>, >=, <, <=Timestamp when the training job was last modified.
maximum_retry_attemptsbigintThe number of times to retry the job.
model_artifactsjsonbInformation about the Amazon S3 location that is configured for storing model artifacts.
nametext=The name of the training job.
output_data_configjsonbThe S3 path where model artifacts that you configured when creating the job are stored.
partitiontextThe AWS partition in which the resource is located (aws, aws-cn, or aws-us-gov).
profiler_configjsonbConfiguration information for Debugger system monitoring,framework profiling and storage paths.
profiler_rule_configurationsjsonbConfiguration information for Debugger rules for profiling system and framework metrics.
profiler_rule_evaluation_statusesjsonbEvaluation status of Debugger rules for profiling on a training job.
profiling_statustextProfiling status of a training job.
regiontextThe AWS Region in which the resource is located.
resource_configjsonbResources, including ML compute instances and ML storage volumes, that are configured for model training.
role_arntextThe AWS Identity and Access Management (IAM) role configured for the training job.
secondary_statustextProvides detailed information about the state of the training job.
secondary_status_transitionsjsonbA history of all of the secondary statuses that the training job has transitioned through.
sp_connection_nametext=, !=, ~~, ~~*, !~~, !~~*Steampipe connection name.
sp_ctxjsonbSteampipe context in JSON form.
stopping_conditionjsonbSpecifies a limit to how long a model training job can run.
tagsjsonbA map of tags for the resource.
tags_srcjsonbA list of tags assigned to the training job.
tensor_board_output_configjsonbConfiguration of storage locations for the Debugger TensorBoard output data.
titletextTitle of the resource.
training_end_timetimestamp with time zoneA timestamp that shows when the training job ended.
training_job_statustext>, >=, <, <=The status of the training job.
training_start_timetimestamp with time zoneIndicates the time when the training job starts on training instances.
training_time_in_secondsbigintThe training time in seconds.
tuning_job_arntextThe Amazon Resource Name (ARN) of the associated hyperparameter tuning job if the training job was launched by a hyperparameter tuning job.
vpc_configjsonbA VpcConfig object that specifies the VPC that this training job has access to.
warm_pool_statusjsonbThe status of the warm pool associated with the training job.

Export

This table is available as a standalone Exporter CLI. Steampipe exporters are stand-alone binaries that allow you to extract data using Steampipe plugins without a database.

You can download the tarball for your platform from the Releases page, but it is simplest to install them with the steampipe_export_installer.sh script:

/bin/sh -c "$(curl -fsSL https://steampipe.io/install/export.sh)" -- aws

You can pass the configuration to the command with the --config argument:

steampipe_export_aws --config '<your_config>' aws_sagemaker_training_job