Category Creation of entity type

Code smell – Control Freak – Introduction

An excellent example of a code smell is using the new keyword. This indicates a hardcoded dependency where the creator controls the new object and its lifetime. This is also known as the Control Freak anti-pattern, but I prefer to box it as a code smell instead of an anti-pattern since the new keyword is not intrinsically wrong.At this point, you may be wondering how it is possible not to use the new keyword in object-oriented programming, but rest assured, we will cover that and expand on the control freak code smell in Chapter 7, Deep Dive into Dependency Injection.

Code smell – Long Methods

The long methods code smell is when a method extends to more than 10 to 15 lines of code. That is a good indicator that you should think about that method differently. Having comments that separate multiple code blocks is a good indicator of a method that may be too long.Here are a few examples of what the case might be:

  • The method contains complex logic intertwined in multiple conditional statements.
  • The method contains a big switch block.
  • The method does too many things.
  • The method contains duplications of code.

To fix this, you could do the following:

  • Extract one or more private methods.
  • Extract some code to new classes.
  • Reuse the code from external classes.
  • If you have a lot of conditional statements or a huge switch block, you could leverage a design pattern such as the Chain of Responsibility, or CQRS, which you will learn about in Chapter 10, Behavioral Patterns, and Chapter 14, Mediator and CQRS Design Patterns.

Usually, each problem has one or more solutions; you need to spot the problem and then find, choose, and implement one of the solutions. Let’s be clear: a method containing 16 lines does not necessarily need refactoring; it could be OK. Remember that a code smell indicates that there might be a problem, not that there necessarily is one—apply common sense.

What is a design pattern? – Introduction

Since you just purchased a book about design patterns, I guess you have some idea of what design patterns are, but let’s make sure that we are on the same page.Abstract definition: A design pattern is a proven technique that we can use to solve a specific problem.In this book, we apply different patterns to solve various problems and leverage some open-source tools to go further, faster! Abstract definitions make people sound smart, but understanding concepts requires more practice, and there is no better way to learn than by experimenting with something, and design patterns are no different.If that definition does not make sense to you yet, don’t worry. You should have enough information by the end of the book to correlate the multiple practical examples and explanations with that definition, making it crystal clear.I like to compare programming to playing with LEGO® because what you have to do is very similar: put small pieces together to create something bigger. Therefore, if you lack imagination or skills, possibly because you are too young, your castle might not look as good as someone with more experience. With that analogy in mind, a design pattern is a plan to assemble a solution that fits one or more scenarios, like the tower of a castle. Once you designed a single tower, you can build multiple by following the same steps. Design patterns act as that tower plan and give you the tools to assemble reliable pieces to improve your masterpiece (program).However, instead of snapping LEGO® blocks together, you nest code blocks and interweave objects in a virtual environment!Before going into more detail, well-thought-out applications of design patterns should improve your application designs. That is true whether designing a small component or a whole system. However, be careful: throwing patterns into the mix just to use them can lead to the opposite result: over-engineering. Instead, aim to write the least amount of readable code that solves your issue or automates your process.As we have briefly mentioned, design patterns apply to different software engineering levels, and in this book, we start small and grow to a cloud-scale! We follow a smooth learning curve, starting with simpler patterns and code samples that bend good practices to focus on the patterns—finally ending with more advanced topics and good practices.Of course, some subjects are overviews more than deep dives, like automated testing, because no one can fit it all in a single book. Nonetheless, I’ve done my best to give you as much information about architecture-related subjects as possible to ensure the proper foundations are in place for you to get as much as possible out of the more advanced topics, and I sincerely hope you’ll find this book a helpful and enjoyable read.Let’s start with the opposite of design patterns because it is essential to identify wrong ways of doing things to avoid making those mistakes or to correct them when you see them. Of course, knowing the right way to overcome specific problems using design patterns is also crucial.

Explanations for tabular data (classification) – Explainable AI

Once the model is deployed successfully, open the Jupyter lab from the workbench created and enter the Python code given in the below steps.
Step 1: Input for prediction and explanation
Select any record from the data. Modify it in the below mentioned format and run the cell:
instances_tabular=[{“BMI”:”16.6”,”Smoking”:”Yes”,”AlcoholDrinking”:”No”,”Stroke”:”No”,”PhysicalHealth”:”3”,”MentalHealth”:”30”,”DiffWalking”:”No”,”Sex”:”Female”,”AgeCategory”:”55-59”,”Race”:”White”,”Diabetic”:”Yes”,”PhysicalActivity”:”Yes”,”GenHealth”:”Very good”,”SleepTime”:”5”,”Asthma”:”Yes”,”KidneyDisease”:”No”,”SkinCancer”:”Yes”}]

Step 2: Selection of the endpoint select
Run the below lines of code to select the endpoint where the model is deployed. In this method, we are using the display name of the endpoint (instead of the endpoint ID). “tabu” is the endpoint name where the model is deployed. Full path of the endpoint (along with the endpoint ID) will be displayed in the output:
endpoint_tabular = gcai.Endpoint(gcai.Endpoint.list(
filter=f’display_name={“tabu”}’,
order_by=’update_time’)[-1].gca_resource.name)
print(endpoint_tabular)

Step 3: Prediction
Run the following lines of code to get the prediction from the deployed model:
tab_endpoint = gcai.Endpoint(endpoint_name)
tab_explain_response = tab_endpoint.explain(instances=instances_tabular)
print(tab_explain_response)
Prediction results will be displayed as shown in the following figure which contains classes and the probability of the classes:

Figure 10.23: Predictions from deployed tabular classification model
Step 4: Explanations
Run the following lines of codes to get the explanations for the input record:
key_attributes = tables_explain_response.explanations[0].attributions[0].feature_attributions.items()
explanations = {key: value for key, value in sorted(key_attributes, key=lambda items: items[1])}
plt.rcParams[“figure.figsize”] = [5,5]
fix, ax = plt.subplots()
ax.barh(list(explanations.keys()), list(explanations.values()))
plt.show()

Shapley value is provided in the explanations for each of the features and it is visualized as shown in the following figure:

Figure 10.24: Explanations from deployed tabular classification model
Deletion of resources
We have utilized cloud storage to store the data, delete the files from the cloud storage manually. Dataset is created for image data and tabular data to delete them manually. Classification models for image and tabular are deployed to get the predictions and explanations, ensure to un-deploy the model from the endpoints and delete the endpoints (Refer Chapter 2, Introduction to Vertex AI & AutoML Tabular and Chapter 3, AutoML Image, text and pre-built models). Predictions are obtained using workbench, ensure to delete the workbench instance.

Explain ability – Explainable AI

Step 5: Explain ability

Explain ability needs to be set in two places while working on AutoML images. The first one is during the training phase of the model and while deploying the model. Follow the below mentioned steps to configure explain ability of the model during training phase.

The steps shown below is for Integrated gradients method of Explainability as shown in the following screenshot:

Figure 10.7: Explain ability of image classification model

  1. Enable to Generate explainable bitmaps.
  2. Visualization type set to Outlines (pixels is another option to understand which pixels are playing important role for the prediction)..
  3.  Color map select Pink/Green (Pink/Green color are used to highlight the areas on the image).
  4.  Clip below and Clip above parameters are used to reduce the noise. Enter 70 and 99.9 for click below and above respectively.
  5. Select Original under Overlay type (pixels will be highlighted on top of the original image).
  6. Enter 50 for the Number of integral steps (increasing this parameter will reduce the approximation error).

Scroll down and follow the steps mentioned in the following step to set the parameters for XRAI method:

Figure 10.8: Explain ability of image classification model (XRAI)

  1. Choose the Color map.
  2. Clip below and Clip above parameters are used to reduce the noise. Enter 70 and 99.9 for click below and above respectively.
  3. Select Original under Overlay type (pixels will be highlighted on top of the original image).
  4. Enter 50 for the Number of Integral steps (increasing this parameter will reduce the approximation error).
  5. Click CONTINUE.

Step 6: Compute and pricing

Follow the below mentioned steps to configure the budget for the model training:

Figure 10.9: Compute and training for image classification model

  1. Set the minimum node hours for 8 (it is the minimum value for image data).
  2. Click START TRAINING.

It will take a few hours to train the image classification model. Prediction and the explanation for the prediction can be obtained.

Example-based explanations – Explainable AI

In the case of explanations based on examples, Vertex AI makes use of the closest neighbor search to produce a list of instances (usually taken from the training set) that are most comparable to the input. These examples allow users to investigate and clarify the behavior of the model since users can reasonably anticipate that comparable inputs would result in similar predictions.

Consider the following scenario: users have a model that analyzes photos to determine whether they depict a bird or an aircraft; however, the model incorrectly identifies certain birds as planes. To figure out what is going on, we may extract other photos from the training set that is comparable to the one we are looking at and utilize example-based explanations to explain what is occurring. When we look at those instances, we see that many of the incorrectly identified birds and the training examples that are comparable to them are dark silhouettes and that most of the dark silhouettes that were aircraft were found in the training set. This suggests that users might potentially increase the quality of the model by including more silhouetted birds in the training set.

Explanations that are based on examples may also help identify confusing inputs that might be improved with human labelling. Models that provide embedding or latent representation for input variables are supported. Tree based models which do not provide embeddings for the inputs are not supported in examples-based explanations.

Feature-based explanations

Feature-based explanations is another way of explaining model output based on the features. The amount of contribution that each feature in the model made to the predictions that were made for a particular instance is shown by the feature attributions. When users make a request for predictions, they will get anticipated values that are suitable for the model you are using. Feature attribution info will be provided when users request for the explanations.

Feature attributions work on image and tabular data. They are supported for AutoML and custom trained models. (Classification models only for image data and classification/regression models for tabular data).

What is Explainable AI – Explainable AI

Introduction

This last chapter of the book covers explainable AI. We will start with understanding what is explainable AI, its need, how explainable AI works on Vertex AI (for image and tabular data) and how to get the explanations from the deployed model.

Structure

In this chapter, we will discuss the following topics:

  • What is Explainable AI
  • Need of Explainable AI
  • XAI on Vertex AI
  • Data for Explainable AI exercise
  • Model training for image data
  • Image classification model deployment
  • Explanations for image classification
  • Tabular classification model deployment
  • Explanations for tabular data
  • Deletion of resources
  • Limitations of Explainable AI

Objectives

By the end of this chapter, you will have a good idea about explainable AI and will know how to get the explanations from the deployed model in Vertex AI.

What is Explainable AI

Explainable AI (XAI) is a subfield of Artificial Intelligence (AI) that focuses on developing methods and strategies for using AI in a way that makes the outcomes of the solution understandable to human specialists. The mission of XAI is to ensure that AI systems be open and honest about not just the function they perform but also the purpose they serve. Interpretability is the broader umbrella under AI which includes explainable AI as one of its subcategories. Users can grasp what a model is learning, the additional information it must provide, and the reasoning behind its judgments concerning the problem that exists in the real world that we are seeking to solve, thanks to the model’s interpretability.

Explainable AI is one of the core ideas that define trust in AI systems (along with accountability, reproducibility, lack of machine bias, and resiliency). The aim and ambition shared by data scientists and machine learning technologists is the development of AI that is explainable.

Creation of entity type – Vertex AI Feature Store

Step 8: Creation of entity type
Entity type will be created under the newly created feature store using the create_entity_type method. Run the below-mentioned code in a new cell to create the entity type. The output of the last line in the code provides the path of the entity type created. Check the feature store landing page, the newly created feature store, and the entity type will be displayed:
entity_creation = client_admin.create_entity_type(
fs_s.CreateEntityTypeRequest(
parent=client_admin.featurestore_path(Project_id, location, featurestore_name),
entity_type_id=Entity_name,
entity_type=entity_type.EntityType(
description=”employee entity”,
),
)
)
print(entity_creation.result())

Step 9: Creation of feature
Once the feature store and entity type are created, the feature needs to be created before ingesting the feature values. For each of the features, information on feature ID, type, and description is provided. Run the following-mentioned code in a new cell to add features:
client_admin.batch_create_features(
parent=client_admin.entity_type_path(Project_id, location, featurestore_name, Entity_name),
requests=[
fs_s.CreateFeatureRequest(
feature=feature.Feature(
value_type=feature.Feature.ValueType.INT64,
description=”employee id”,
),
feature_id=”employee_id”,
),
fs_s.CreateFeatureRequest(
feature=feature.Feature(
value_type=feature.Feature.ValueType.STRING,
description=”education”,
),
feature_id=”education”,
),
fs_s.CreateFeatureRequest(
feature=feature.Feature(
value_type=feature.Feature.ValueType.STRING,
description=”gender”,
),
feature_id=”gender”,
),
fs_s.CreateFeatureRequest(
feature=feature.Feature(
value_type=feature.Feature.ValueType.INT64,
description=”no_of_trainings”,
),
feature_id=”no_of_trainings”,
),
fs_s.CreateFeatureRequest(
feature=feature.Feature(
value_type=feature.Feature.ValueType.INT64,
description=”age”,
),
feature_id=”age”,
),
],
).result()

Once the features are created, they are displayed in the output of the cell as shown in Figure 9.19:

Figure 9.19: Addition of features to the feature store using Python

Step 10: Define the ingestion job
As seen in the web console, feature values can be ingested from cloud storage or BigQuery. We shall use the same CSV file which has been uploaded to the cloud storage. Importantly, we should also supply the timestamp information while ingesting the values. Timestamps can be provided in the code or there can be a separate column in the data that contains timestamp information. Timestamp information must be in google.protobuf.Timestamp format. Run the following code in a new cell to define the ingestion job:
seconds = int(datetime.datetime.now().timestamp())
timestamp_input = Timestamp(seconds=seconds)
ingest_data_csv = fs_s.ImportFeatureValuesRequest(
entity_type=client_admin.entity_type_path(
Project_id, location, featurestore_name, Entity_name
),
csv_source=io.CsvSource(
gcs_source=io.GcsSource(
uris=[
“gs://feature_store_input/employee_promotion_data_fs.csv”
]
)
),
entity_id_field=”employee_id”,
feature_specs=[
ImportFeatureValuesRequest.FeatureSpec(id=”employee_id”),
ImportFeatureValuesRequest.FeatureSpec(id=”education”),
ImportFeatureValuesRequest.FeatureSpec(id=”gender”),
ImportFeatureValuesRequest.FeatureSpec(id=”no_of_trainings”),
ImportFeatureValuesRequest.FeatureSpec(id=”age”),
],
feature_time=timestamp_input,
worker_count=1,
)

Note: If all feature values were generated at the same time, there is no need to have a timestamp column. Users can specify the timestamp as part of the ingestion request.
Step 11: Initiation of ingestion job
The ingestion job needs to be initiated after it is defined, run the following line of codes to begin the ingestion process:
ingest_data = client_admin.import_feature_values(ingest_data_csv)
ingest_data.result()

Once the ingestion is complete, it will provide information on the number of feature values ingested as shown in Figure 9.20:

Figure 9.20: Ingestion of feature values using Python

Entity type created successfully – Vertex AI Feature Store

Step 6: Entity type created successfully

The entity type is created successfully as shown in Figure 9.8 under the selected feature store:

Figure 9.8: Entity type created and listed on the landing page

  1. Click the newly created Entity type.

Step 7: Creation of features

Once the entity type is created, features need to be created before ingesting the values. Follow the steps mentioned in Figure 9.9 to create features:

Figure 9.9: Creation of features

  1. Click ADD FEATURES.

A new side tab will pop out to enter the features as shown in Figure 9.10, follow the below steps to create the features:

Figure 9.10: Adding user input for feature creation

  1. Enter the Feature name.
  2. Enter the Value type stored in that feature.
  3. Enter the Description for the feature.
  4. Click Add Another Feature to add new features.
  5. Click SAVE, once all the features are added.

Step 8: Features created successfully

Once the features are created successfully, they are displayed on the entity type page as shown in Figure 9.11:

Figure 9.11: Features listed under the entity type

  1. Newly created features are displayed in tabular format.
  2. Click on Ingest Values to add the feature values.

Step 9: Ingesting feature values

Follow the steps mentioned in Figure 9.12 to initiate the ingestion of feature values:

Figure 9.12: Importing data to features

  1. Data can be ingested from cloud storage or BigQuery. Select Cloud Storage CSV file.
  2. Select the CSV file from the cloud storage by clicking BROWSE.
  3. Click CONTINUE and follow the steps mentioned in Figure 9.13:

After selecting the data source, we need to map the columns of the data source to the features. Follow the steps mentioned in the Figure 9.13 to map the features:

Figure 9.13: Mapping of columns to features

  1. Add employee ID, since that is the column which is containing unique values.
  2. Select to enter the Timestamp manually. If data contains the timestamp values, the same column can be used here.
  3. Select the date and time.
  4. Map the column names in the CSV file to the features.
  5. Click INGEST to initiate the ingestion job.

Step 10: Ingestion job successful

Once the feature values are ingested successfully, the ingestion job status will be updated as shown in Figure 9.14:

Figure 9.14: Ingestion jobs of feature store

  1. The ingestion job is completed successfully.

Step 11: Landing page of feature store after the creation of feature store, entity type, and features

The landing page of the feature store is shown in Figure 9.15, all the features under entity type and feature store are listed and displayed in the tabular format:

Figure 9.15: Landing page of feature store after the creation of features

  1. Click the age feature. The window will navigate to the properties of the feature as shown in Figure 9.16:

Figure 9.16: Properties of feature

  1. For all the features, Feature Properties consisting of basic information and statistics are displayed.
  2. Metrics are populated if the monitoring feature is enabled for the feature store and for that particular feature.

Knowing Vertex AI feature store – Vertex AI Feature Store

Introduction

After learning about the pipelines of the platform, we will move to the feature store of GCP. In this chapter, we will start with an understanding of the feature store, and the advantages of features followed by a hands-on feature store.

Structure

In this chapter, we will cover the following topics:

  • Knowing Vertex AI feature store
  • Hierarchy of feature store
  • Advantages of feature store
  • Disadvantages of feature store
  • Working on feature store using GUI
  • Working on feature store using python
  • Deleting resources
  • Best practices for Feature store

Objectives

By the end of this chapter, users will have a good idea about the feature store, when to use it, and how to employ it with the web console of GCP and Python.

Knowing Vertex AI feature store

Vertex AI Feature Store is a centralized repository for managing and delivering machine learning features. To speed up the process of creating and delivering high-quality ML applications, many organizations are turning to centralized feature stores to facilitate the sharing, discovery, and re-use of ML features at scale.

The storage and processing power, as well as other components of the backend infrastructure, are handled by Vertex AI Feature Store, making it a fully managed solution. As a result of this strategy, data scientists may ignore the difficulties associated with delivering features into production and instead concentrate on the feature computation logic.

The feature store in Vertex AI is an integral aspect of the overall system. Use Vertex AI Feature Store on its own or include it in your existing Vertex AI workflows. For instance, the Vertex AI Feature Store may be queried for information to be used in the training of custom or AutoML models.

Hierarchy of feature store

The collection of entities for a certain entity time is stored in a feature store. Fields like entity ID, timestamp, and a series of attributes like feature 1, feature 2, and so on, are all defined for each entity type. The hierarchy of the feature store is described in Figure 9.1:

Figure 9.1: Hierarchy of feature store

  • Feature store: A top-level container for entity types, features, and their values.
  • Entity type: A collection of semantically related features (real or virtual).
  • Entity: An instance of the entity type.
  • Feature: A measurable property or attribute of an entity type.
  • Feature values: These contain values of the features at a specific point in time.

Pipeline artifacts stored in cloud storage – Pipelines using TensorFlow Extended

Step 10: Pipeline artifacts stored in cloud storage

Pipeline artifacts are stored in the cloud storage as shown in Figure 8.15:

  • Module folder contains Python files which we had pushed while creating transform and trainer components.
  • Output_model contains trained classification model.
  • Root folder contains artifacts of each of the component of the pipeline:

Figure 8.15: Pipeline artifacts stored in the cloud storage

Deletion of resources

We have utilized workbench, cloud storage to store the data and the artifacts of the pipeline. For deletion of resources, ensure to delete the workbench, clear the data stored in the cloud storage.

Conclusion

We learnt about the TFX, a few of its components and constructed pipeline using some of the standard components. Also, we understood how to use Kubeflow for the orchestration of TFX pipeline on vertex AI.

In the next chapter, we will start understanding and working on feature store of the Vertex AI.

Questions

  1. Which artifacts of the transform component is used in the training component of the pipeline?
  2. What are the different orchestration options TFX supports?
  3. Try using evaluator component between trainer and pusher component and re-construct the pipeline. (Use evaluator component to evaluate the model performance and push it only if the performance is good).