Importing packages – Pipelines using TensorFlow Extended-1
Step 6: Importing packages
Run the following mentioned line of codes to import required packages:
import tensorflow as tf
import tensorflow_transform as tft
from tensorflow import keras
from tensorflow_transform.tf_metadata import schema_utils
from tfx import v1 as tfx
from tfx_bsl.public import tfxio
from tfx.components.base import executor_spec
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
from tensorflow_metadata.proto.v0 import schema_pb2
import os
from typing import List
Step 7: Understanding a few TFX components from code
Even before jumping into pipeline creation and execution, let us try to understand how a few of the TFX components can be used individually and analyze the output of those components. Let us start with the data Examplegen component.
Examplegen: Run the following code in a new cell:
context_in = InteractiveContext()
example_gen_csv = tfx.components.CsvExampleGen(input_base=INPUT_DATA_DIR)
context_in.run(example_gen_csv)
Examplegen component can read the data from various sources and data types such as CSV files, TRF records, and BigQuery. An interactive widget displaying the results of ExampleGen will appear on the notebook once run is complete as shown in Figure 8.6. ExampleGen typically generates two types of artifacts, which are known as training and evaluation examples. ExampleGen will divide the data into two thirds for the training set and one third for the evaluation set by default. Location where these artifacts are stored can also be viewed as shown:

Figure 8.6: Output for example_gen component
StatisticsGen: Run the following code in a new cell:
gen_statistics = tfx.components.StatisticsGen(examples=example_gen_csv.outputs[‘examples’])
context_in.run(gen_statistics)
context_in.show(gen_statistics.outputs[‘statistics’])
Statistics over your dataset are computed using the StatisticsGen component. These statistics provide a quick overview of your data, including details such as shape, features, and value distribution. You will use the output from the ExampleGen as input to compute statistics about the data. An interactive widget displaying the statistics of train and evaluation dataset separately appears once the run is complete as shown in Figure 8.7:

Figure 8.7: Output for statsics_gen component
SchemaGen: Run the following code in a new cell:
gen_schema = tfx.components.SchemaGen(statistics=gen_statistics.outputs[‘statistics’])
context_in.run(gen_schema)
context_in.show(gen_schema.outputs[‘schema’])
From the statistics, the SchemaGen component will generate a schema for your data. A Schema is simply a data definition. It defines the data features’ types, expected properties, bounds, and so on. Output of the SchemaGen is as shown in Figure 8.8:

Figure 8.8: Output for schema_gen component
ExampleValidator: Run the following code in a new cell:
stats_validate = tfx.components.ExampleValidator(statistics=gen_statistics.outputs[‘statistics’],schema=gen_schema.outputs[‘schema’])
context_in.run(stats_validate)
context_in.show(stats_validate.outputs[‘anomalies’])
Based on the defined schema, this component validates your data and detects anomalies. When in production, this can be used to validate any new data that enters your pipeline. It can detect drift, changes, and skew in new data, unexpected types, new column which was not in the schema. Output of ExampleValidator is as shown in Figure 8.9:

Figure 8.9: Output of example validator component