diff --git a/applications/OneFormer/Inference_with_OneFormer.ipynb b/applications/OneFormer/Inference_with_OneFormer.ipynb new file mode 100644 index 000000000..911306d4c --- /dev/null +++ b/applications/OneFormer/Inference_with_OneFormer.ipynb @@ -0,0 +1,458 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "f58b9dcf", + "metadata": {}, + "source": [ + "# 推理与OneFormer:通用图像分割\n", + "原论文:https://arxiv.org/abs/2211.06220\n", + "OneFormer在Mask2Former框架中集成了一个文本模块,以在各自的子任务(实例、语义或panoptic)上约束模型。这样可以得到更准确的结果,但代价是增加了延迟。" + ] + }, + { + "cell_type": "markdown", + "id": "7a37a1f9", + "metadata": {}, + "source": [ + "## 设置环境\n", + "Mindspore 2.5.0\n", + "\n", + "Mindnlp 0.4.0\n", + "\n", + "python 3.9.0" + ] + }, + { + "cell_type": "markdown", + "id": "c6085611", + "metadata": {}, + "source": [ + "## 图像加载\n", + "\n", + "接下来,我们加载一个我们想要执行推理的图像。这里我们加载熟悉的猫图像,这是COCO数据集的一部分。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bf9de159-3e3c-4ef1-b54c-ab2623fe8526", + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + "\n", + "url = 'http://images.cocodataset.org/val2017/000000039769.jpg'\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + "image" + ] + }, + { + "cell_type": "markdown", + "id": "7863d8a8", + "metadata": {}, + "source": [ + "## 为模型准备图像\n", + "\n", + "我们可以使用处理器准备图像。OneFormer利用了一个处理器,它内部由一个图像处理器(用于图像模态)和一个标记器(用于文本模态)组成。OneFormer实际上是一个多模态模型,因为它结合了图像和文本来解决图像分割。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3e936c0d-d6cd-426d-bf9e-e05dff42e145", + "metadata": {}, + "outputs": [], + "source": [ + "from mindnlp.transformers import AutoProcessor\n", + "\n", + "# the Auto API loads a OneFormerProcessor for us, based on the checkpoint\n", + "processor = AutoProcessor.from_pretrained(\"shi-labs/oneformer_coco_swin_large\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f159b71-4f30-43b9-834d-e53cb6c20153", + "metadata": {}, + "outputs": [], + "source": [ + "# prepare image for the model\n", + "panoptic_inputs = processor(images=image, task_inputs=[\"panoptic\"], return_tensors=\"ms\")\n", + "for k,v in panoptic_inputs.items():\n", + " print(k,v.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "c3b95c44", + "metadata": {}, + "source": [ + "可以看到,这个模型有一个额外的“task_inputs”,这是MaskFormer和Mask2Former所没有的。这些文本输入允许模型区分实例/语义/全景分割。\n", + "\n", + "\n", + "\n", + "我们可以将任务输入解码回文本:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f74a380a-7245-4c3b-bd4c-ee65481b8eb5", + "metadata": {}, + "outputs": [], + "source": [ + "processor.tokenizer.batch_decode(panoptic_inputs.task_inputs)" + ] + }, + { + "cell_type": "markdown", + "id": "11d76bcc", + "metadata": {}, + "source": [ + "## 加载模型\n", + "\n", + "\n", + "\n", + "接下来,让我们从mindnlp/transformers加载一个模型。在这里,我们用一个swing -large的主干加载OneFormer模型,该主干是在COCO数据集上训练的。" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "c53c4c55-17ea-4be2-a761-ec793413a9bc", + "metadata": {}, + "outputs": [], + "source": [ + "from mindnlp.transformers import AutoModelForUniversalSegmentation\n", + "\n", + "model = AutoModelForUniversalSegmentation.from_pretrained(\"shi-labs/oneformer_coco_swin_large\")" + ] + }, + { + "cell_type": "markdown", + "id": "8dcdf3b9", + "metadata": {}, + "source": [ + "## 前向传播\n", + "\n", + "\n", + "\n", + "mindnlp中的前向传播是这样完成的:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e5399787-05ac-4e55-95e8-bc712e5efc1c", + "metadata": {}, + "outputs": [], + "source": [ + "from mindnlp.core import ops, no_grad\n", + "\n", + "# forward pass\n", + "with no_grad():\n", + " outputs = model(**panoptic_inputs)" + ] + }, + { + "cell_type": "markdown", + "id": "b00c6dbf", + "metadata": {}, + "source": [ + "# 可视化\n", + "\n", + "\n", + "\n", + "接下来,我们可以对原始输出进行后处理,并将预测可视化。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e740410c-7081-407a-9a06-03e159f14e04", + "metadata": {}, + "outputs": [], + "source": [ + "panoptic_segmentation = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]\n", + "print(panoptic_segmentation.keys())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b36ee706-a401-4d39-87ea-63d77fcd299a", + "metadata": {}, + "outputs": [], + "source": [ + "from collections import defaultdict\n", + "import matplotlib.pyplot as plt\n", + "from matplotlib import cm\n", + "import matplotlib.patches as mpatches\n", + "import numpy as np\n", + "from mindspore import Tensor\n", + "\n", + "def draw_panoptic_segmentation(segmentation, segments_info):\n", + "\n", + " if isinstance(segmentation, Tensor):\n", + " segmentation_np = segmentation.asnumpy()\n", + " else:\n", + " segmentation_np = np.array(segmentation)\n", + " \n", + " if not np.issubdtype(segmentation_np.dtype, np.integer):\n", + " segmentation_np = segmentation_np.astype(np.int32)\n", + " \n", + " # Get the maximum segment ID using numpy\n", + " max_segment = np.max(segmentation_np)\n", + " viridis = cm.get_cmap('viridis', max_segment + 1) \n", + " \n", + " fig, ax = plt.subplots()\n", + " ax.imshow(segmentation_np)\n", + " \n", + " instances_counter = defaultdict(int)\n", + " handles = []\n", + " \n", + " for segment in segments_info:\n", + " segment_id = segment['id']\n", + " segment_label_id = segment['label_id']\n", + " segment_label = model.config.id2label[segment_label_id] \n", + " label = f\"{segment_label}-{instances_counter[segment_label_id]}\"\n", + " instances_counter[segment_label_id] += 1\n", + " color = viridis(segment_id)\n", + " handles.append(mpatches.Patch(color=color, label=label))\n", + " \n", + " ax.legend(handles=handles)\n", + " plt.savefig('cats_panoptic.png')\n", + "draw_panoptic_segmentation(**panoptic_segmentation)\n" + ] + }, + { + "cell_type": "markdown", + "id": "f24c019a", + "metadata": {}, + "source": [ + "可以看出,该模型能够正确区分两只不同的猫以及两个不同的遥控器。" + ] + }, + { + "cell_type": "markdown", + "id": "8acb48fc", + "metadata": {}, + "source": [ + "## 推理:语义分割\n", + "我们还可以使用相同的模型对猫咪图像进行语义分割!我们只需要更改任务输入(即模型的文本输入),将其改为“此任务为语义”。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5496e90-37f4-4fe7-a8db-521b60f2ea37", + "metadata": {}, + "outputs": [], + "source": [ + "# prepare image for the model\n", + "semantic_inputs = processor(images=image, task_inputs=[\"semantic\"], return_tensors=\"ms\")\n", + "for k,v in semantic_inputs.items():\n", + " print(k,v.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "ab5497ee-1c8d-4740-bf22-65d2de5eac2c", + "metadata": {}, + "outputs": [], + "source": [ + "# forward pass\n", + "with no_grad():\n", + " outputs = model(**semantic_inputs)" + ] + }, + { + "cell_type": "markdown", + "id": "6cd6bb98", + "metadata": {}, + "source": [ + "让我们对结果进行后处理并可视化:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b0400d2e-d921-422b-b8d9-b08f872251f7", + "metadata": {}, + "outputs": [], + "source": [ + "semantic_segmentation = processor.post_process_semantic_segmentation(outputs)[0]\n", + "semantic_segmentation.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "73b6a7f7-aaf9-408e-bcf9-b21dbf38e937", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import matplotlib.patches as mpatches\n", + "from matplotlib.colors import ListedColormap, LinearSegmentedColormap\n", + "from matplotlib import cm\n", + "\n", + "\n", + "def draw_semantic_segmentation(segmentation):\n", + "\n", + " if not isinstance(segmentation, np.ndarray):\n", + " segmentation = np.array(segmentation)\n", + " \n", + " segmentation = segmentation.astype(np.int32)\n", + " \n", + " max_label = np.max(segmentation) \n", + " viridis = cm.get_cmap('viridis', max_label)\n", + " \n", + " labels_ids = np.unique(segmentation).tolist()\n", + " \n", + " fig, ax = plt.subplots()\n", + " ax.imshow(segmentation, cmap=viridis) \n", + " handles = []\n", + " \n", + " for label_id in labels_ids:\n", + " label = model.config.id2label[label_id]\n", + " color = viridis(label_id / max_label) \n", + " handles.append(mpatches.Patch(color=color, label=label))\n", + " \n", + " ax.legend(handles=handles)\n", + "\n", + "draw_semantic_segmentation(semantic_segmentation)" + ] + }, + { + "cell_type": "markdown", + "id": "a65f1ace", + "metadata": {}, + "source": [ + "可以看到,在语义分割中,不会区分单个实例(可数的事物,如猫咪或遥控器)。相反,只会为“猫咪”类别等生成一个单一的掩码。" + ] + }, + { + "cell_type": "markdown", + "id": "78bdb604", + "metadata": {}, + "source": [ + "## 推理:实例分割\n", + "\n", + "同样,我们可以使用相同的模型进行实例分割,我们只需要更改文本输入即可。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0ce50c61-ef57-4773-979f-a115f847b0a2", + "metadata": {}, + "outputs": [], + "source": [ + "# prepare image for the model\n", + "instance_inputs = processor(images=image, task_inputs=[\"instance\"], return_tensors=\"ms\")\n", + "for k,v in instance_inputs.items():\n", + " print(k,v.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "08d35d25-df61-4025-8a75-32828818f0e8", + "metadata": {}, + "outputs": [], + "source": [ + "# forward pass\n", + "with no_grad():\n", + " outputs = model(**instance_inputs)" + ] + }, + { + "cell_type": "markdown", + "id": "06b9ecbe", + "metadata": {}, + "source": [ + "让我们对结果进行后处理并可视化:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f1049e52-8fc0-45c6-9ee9-4d4b80d9dd2d", + "metadata": {}, + "outputs": [], + "source": [ + "instance_segmentation = processor.post_process_instance_segmentation(outputs)[0]\n", + "instance_segmentation.keys()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0155c6e6-add1-44ac-9845-f80a2fe6e063", + "metadata": {}, + "outputs": [], + "source": [ + "from collections import defaultdict\n", + "import matplotlib.pyplot as plt\n", + "from matplotlib import cm\n", + "import matplotlib.patches as mpatches\n", + "import numpy as np # 确保导入 numpy\n", + "\n", + "def draw_instance_segmentation(segmentation, segments_info):\n", + " # 转换数据类型(如果是张量或 object 类型)\n", + " if hasattr(segmentation, 'asnumpy'): # 处理 MindSpore 张量\n", + " segmentation = segmentation.asnumpy()\n", + " segmentation = np.array(segmentation, dtype=np.int32) # 强制转换为 int32\n", + " \n", + " # 获取颜色映射\n", + " max_segment_id = np.max(segmentation) # 使用 NumPy 的 max\n", + " viridis = cm.get_cmap('viridis', max_segment_id)\n", + " \n", + " fig, ax = plt.subplots()\n", + " ax.imshow(segmentation) # 现在 segmentation 是数值类型\n", + " \n", + " instances_counter = defaultdict(int)\n", + " handles = []\n", + " for segment in segments_info:\n", + " segment_id = segment['id']\n", + " segment_label_id = segment['label_id']\n", + " segment_label = model.config.id2label[segment_label_id]\n", + " label = f\"{segment_label}-{instances_counter[segment_label_id]}\"\n", + " instances_counter[segment_label_id] += 1\n", + " color = viridis(segment_id)\n", + " handles.append(mpatches.Patch(color=color, label=label))\n", + " \n", + " ax.legend(handles=handles)\n", + " plt.savefig('cats_panoptic.png')\n", + "\n", + "# 调用函数(确保 instance_segmentation 包含正确的键)\n", + "draw_instance_segmentation(**instance_segmentation)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "MindSpore", + "language": "python", + "name": "mindspore" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}