Jetson Nano Custom Object Detection - how to train your own AI

oye, oye, fabricantes de robots, ¿cómo estás? Espero que estés teniendo un buen día hasta ahora, ¿quieres saber cómo configurar un jet y un nano para detectar objetos personalizados y ver esos objetos detectados en tiempo real? Entonces este es el programa para ti. así que vamos a sumergirnos directamente, ven conmigo mientras aprendemos a construir los robots, dales vida con el código y diviértete mucho en el camino, está bien, déjame pasar a mi discurso principal y podemos, eh, Richard dice que todos los días son Navidad para kevin, acabo de recibir muchas cosas, así que sí, estaba jugando con este pequeño sello m5 antes, bueno, lo echaré un vistazo hacia el final del programa junto con algunas otras cosas que también tengo que mostrarte, pero esto es todo sobre la detección de objetos, así que echemos un vistazo a esto, ¿ estamos de acuerdo? Esta sesión trata sobre cómo entrenar nuestra red neuronal para detectar objetos personalizados.Hemos hecho algo similar como esto antes en la frambuesa pi, pero esto es considerablemente más rápido, como años luz. más rápido, así que veremos los tipos de computadora vi sión vamos a ver algo llamado mobilenet ssd y lo que eso significa vamos a ver cómo preparamos nuestro modelo para capturar los activos, las imágenes, etiquetar esas imágenes, entrenar el modelo y luego convertirlo a algo llamado onnx y También veremos qué es eso y finalmente usaremos el modelo y haremos un poco de demostración, es tan genial que no puedo esperar para mostrarte esto, así que esto es lo que estamos filmando. por querer poder detectar e identificar y localizar diferentes tipos de robots en una escena de video en vivo, así que en esa pequeña demostración tengo una instantánea de allí , puede ver que detectó un bricolaje automático en la escena y también detectó un dispositivo inteligente robot en la escena también, así que puedes ver por encima de mi hombro aquí, aquí es donde he estado haciendo esa captura de imagen y si voy a mi cámara superior, puedes ver que esta pequeña cámara que está en este brazo aquí que en realidad está enchufada directamente en el jetson nano, así que simplemente voy a cambiar mi otro cámara a los jets y nano, así que lo tenemos listo para usar y, um, pueden ver allí que eso apunta a la escena.He usado un fondo blanco solo para que no esté desordenado y no se confunda con otros bits y bobs. y obtiene una captura bastante limpia de esa imagen, así que eso es lo que estamos buscando y he decidido capturar cuatro tipos diferentes de robots. En última instancia, me gustaría poder detectar muchos tipos diferentes. así que tal vez una cabeza en movimiento tal vez um no sé qué más tenía allí el clima bot um el gato robot el gato abierto hay muchos robots diferentes que podríamos capturar, pero cuatro fueron suficientes y lo haré explique por qué cuatro fue suficiente en breve, así que modelos de visión por computadora, por lo que hay tres cosas que suceden con la visión por computadora, hay detección de clasificación y segmentación, por lo que la segmentación proporciona un contorno exacto alrededor de la forma de una imagen píxel a píxel para que le diga exactamente dónde eso es tan imagina que estás conduciendo d es dueño de una calle y desea obtener una persona y no solo un rectángulo alrededor de una persona, desea saber el límite exacto de ellos o tal vez es un espacio de oficina y desea saber dónde está el piso, dónde están las paredes y dónde están los muebles es que píxel a píxel será realmente importante para saber en qué segmento cae no estamos interesados en lo que estamos haciendo hoy um, así que la clasificación identifica lo que hay en una imagen, así que usaremos eso hoy definitivamente detectaremos qué tipo de robot tenemos y luego la detección coloca un cuadro delimitador alrededor de objetos específicos en la escena, por lo que puede haber más de uno, así que en ese pequeño segmento que tengo allí, puedes ver que hay una casa y hay un árbol y también hay algunas coordenadas, pero eso es lo que hace la detección, por lo que la detección también se divide en algunos pasos diferentes, por lo que la detección es una rama de la visión por computadora que se ocupa de la localización e identificación de objetos. ects, por lo que esta localización e identificación son dos pasos diferentes y cuando los ponemos juntos logramos ese objetivo de detección de objetos , por lo que la localización se trata específicamente de ubicar el objeto dentro de esa imagen o secuencia de video , una identificación se ocupa de asignar al objeto una clase o etiqueta específica, por lo que esas son dos cosas separadas y lo que vamos a usar el mobile.ssd puede hacer muchas de esas cosas muy rápidamente juntos porque está optimizado para ese caso de uso, así que preparando nuestro modelo, echemos un vistazo a lo que tenemos que hacer. aquí, así que esto tomará algo de tiempo, esto me tomó todo el sábado para hacerlo, en primer lugar, necesitamos crear un archivo de etiqueta para que, literalmente, creemos un archivo de texto y cada línea en el archivo tendrá una clase o cosa diferente que queremos detectar algo a lo que queremos ponerle una etiqueta, así que elegí tener un robot inteligente, um, un robot de cuatro sonrisas, un mini inteligente y un bricolaje automático solo por un poco de variación, um, así que queremos crear ese archivo que queremos recopilar a carga de imágenes y compramos una carga de imágenes que estamos tomando para aproximadamente mil imágenes de cada clase, por lo que son aproximadamente 4000 imágenes idealmente para obtener esta red neuronal, por lo que es realmente un beso de chef. etiquetar los objetos con las imágenes para producir un archivo de anotación y podemos usar una herramienta para hacerlo y, de hecho, nvidia ha fusionado esos dos primeros pasos para que, mientras captura la imagen, también puede etiquetarla solo para intentar acelerar Completar todo el proceso porque lleva mucho tiempo hacerlo una vez que tenemos todos los que necesitamos para entrenar nuestro modelo usando este ssd de red móvil y luego necesitamos convertir la salida de eso en algo que es esta red neuronal opel intercambie en un formato de modelo x y realmente puede usar esto en diferentes dispositivos una vez que lo haya creado, por ejemplo, podría crear muchas de estas cosas, um, entrenarlo en su solo nano y luego moverlo a una frambuesa pi y debería funciona igual de bien es un formato que está abierto de todos ellos, tan pascal voc, así que hay un formato particular que necesitamos para que estos um almacenen estos archivos y los creen, y muchas de las herramientas se basan en este formato de clase de objeto visual pascal, por lo que son solo un montón de carpetas dentro de esas carpetas. espera que tengan ciertos nombres de archivo, lo siento, ciertos nombres de carpeta y luego los archivos que están allí, así que, por ejemplo, en las imágenes puede tener un montón de imágenes y luego tiene un archivo xml correspondiente para cada imagen y esos archivos xml en el las anotaciones son creadas por su herramienta de etiquetado y simplemente dice cuál es el archivo, qué objetos puede detectar allí y cuáles son las coordenadas para cada uno de ellos porque lo que hará nuestra red neuronal es simplemente extraer cada una de esas imágenes redimensionarlos en escala de grises y luego empujarlos a través de la red neuronal y podemos echar un vistazo en un par de minutos, por lo que crear el archivo de etiquetas es muy fácil, solo crea un archivo de texto, así que como dije lo que tengo quad automático inteligente inteligente Mini bricolaje e inteligentes y se puede ver en esa captura de pantalla no se detecta los cuatro tipos allí y acabo de poner esa pequeña burbuja roja um justo encima de él para que pueda ver un poco más claro, porque a veces ese texto es un poco pequeño, también obtienes el porcentaje de confianza de que se ha detectado correctamente, por lo que puedes ver las sonrisas, las sonrisas del quad , es 99.5, tiene mucha confianza en que el auto diy 92.1 y eso es probablemente porque no tomé tantas fotos de eso, como lo hice con las otras sonrisas mini, se detectó 85.7 allí, eso es porque es físicamente más pequeño, tiene menos píxeles para jugar y luego aquí tenemos una superposición, por lo que se detecta sonrisas dos veces una con 96.4 y la otra con 89.6 y eso es probablemente porque son dos tipos de sonrisas que puede detectar, puede detectar una con la pantalla de matriz o la que tiene el telémetro allí, um, pero creo que también hay un pequeño error en esta red de detección donde puede superponerlos y tiene que ver con un umbral, así que todavía tengo que arreglar ese um, así que la captura de activos, así que este es un pequeño video de mí capturando algunas imágenes, por lo que literalmente solo dibuja un rectángulo alrededor del robot, simplemente lo mueve la escena um sí sí luego regresa dibuja el rectángulo a su alrededor y luego subes a la parte superior derecha dices qué tipo de objeto es, así que este es un robot inteligente descongelas el marco y luego mueves el objeto nuevamente y luego regresas dibujar debajo del rectángulo es como una animación stop-motion, por lo que puede ver aquí que esta es la cantidad de esfuerzo que se necesita para producir un modelo razonable, necesita muchas imágenes e idealmente no solo desde un ángulo, las desea desde arriba hacia abajo diferentes condiciones de iluminación con diferentes fondos superpuestos a otros objetos y así sucesivamente, por lo que cuanto más esfuerzo pongas en esto, mejor modelo obtendrás, pero um, sabía que solo tenía mucho tiempo para hacer esto y permitía una cantidad de tiempo no funciona lo necesito para solucionar problemas, así que sí, debes repetir esto unas mil veces para cada clase de objeto que estás buscando, así que en esa pieza de software que estoy usando hay parte de esta inferencia de nvidia jetson um biblioteca github que nosotros Echaré un vistazo y esto se llama la utilidad de captura de la cámara, por lo que le permite hacer esa captura de imagen, congelar el cuadro, dibujar el rectángulo y clasificarlo bien, así que hipotético dice que usar un fondo diferente también ayuda, sí, para que pueda, puede figurar Sacar lo que es solo ruido y deshacerse de eso, mientras que yo he optado por ese fondo blanco y probablemente no sea lo mejor, pero fue un video más agradable para grabar, así que entrenando y probando el flujo del proceso para esto, así que ' He tomado esta diapositiva de un video um anterior que hice, que estaba en la detección de objetos usando un pi um de frambuesa y usando el um en la cabeza móvil como la cámara, así que si quieres ver eso, pondré un enlace en la descripción en el video no lo he hecho uno que todavía, pero para cuando veas esto, lo haré, así que necesitamos cargar los datos en el modelo de datos, necesitamos ajustar el modelo de entrenamiento con el conjunto de datos de entrenamiento y normalmente separas tus datos de entrenamiento, tus datos de prueba y tu um, sus datos de validación tienen como tres conjuntos diferentes y dos de los conjuntos, um, podría haber visto antes, pero el tercer conjunto nunca lo habrá visto y así es como puede ver si realmente está detectando cosas, así que lo que realmente he hecho aquí es i Tengo tres tipos diferentes de um smart um quad smarts nunca he visto este en particular um he visto otro que tengo en mi pared de allí um y fue capaz de detectar este aunque nunca antes lo había visto así que eso me muestra que el modelo está funcionando bien, así que probamos el modelo con los datos de la prueba, el ajuste de peso se realiza automáticamente mediante el algoritmo del programa de entrenamiento y seguimos entrenando hasta que los resultados son aceptables y una de las formas en que puedo hacer eso es solo al definir cuántas épocas pasará, así que para esta, pasé por 30 épocas para entrenarlo, inicialmente hice una solo para ver y luego funcionó bien, luego fui a aproximadamente 30. y eso tomó alrededor de 30. de minutos a 45 minutos, es muy rápido para un dispositivo móvil, así que nuevamente, estas diapositivas, solo las siguientes dos o tres, son de la presentación que hice sobre la detección de objetos anteriormente, así que lo analizaré un poco si quieres más detalles al respecto. y ver el video, pero lo que en esencia que la captura de una imagen así es como funciona la máquina de aprendizaje um entonces la cámara tomará eso y pixelates que se um está convirtiendo esa cosa real en una serie de valores RGB tan verde rojo y azul y cada una de ellos están entre 0 y 255 , luego creará una versión en escala de grises de eso porque en realidad no necesitamos rojo, verde y azul para una gran cantidad de procesamiento de imágenes, solo necesitamos ver la imagen en sí, así que conviértala en escala de grises, por lo que solo el valor entre 0 y 255 wil Nos doy la escala de grises y luego esencialmente la dividimos en una gran matriz para que la red neuronal no vea como lo hacemos nosotros, no define áreas y formas y todo lo que hay literalmente son solo píxeles que son valores entre 0 y 255 y luego , eventualmente, se convierten en un número entre 0 y 1, por lo que se vuelven como un número de punto flotante y, um, solo representan el valor en particular para cada celda en la matriz, por lo que lo convertimos a escala de grises, es más simple de procesar, no Necesitamos la información de color, en realidad no agrega mucho valor en los casos que estamos haciendo y muchas veces también reducirá la imagen, por lo que podríamos haber capturado esto como una imagen HD, um, así que tal vez um como 1920 por 1080p algo así y lo reducirá a 300 por 300 píxeles por lo que es mucho más pequeño y mucho más fácil de procesar uh, no toma tanto tiempo para pasar y de hecho uno de los um los que vimos en nuestra transmisión anterior, creo que en realidad fueron reducidos reducido a una cuadrícula de imágenes de 28 por 28, mientras que estamos trabajando con 300 por 300, por lo que es un poco más de resolución, obtenemos más precisión, pero se necesita un poco más de potencia para hacer eso, así que, como dije, trata la imagen como una matriz de valores, así que lo que vemos es que los colores son en realidad solo valores entre 0 y cinco cinco, por lo que son muy pequeños para almacenar cada uno de ellos solo para morder y luego está la red neuronal, por lo que están todas estas cosas azules que son las las neuronas el um la red son las conexiones entre todas esas neuronas y se alinean en una especie de capas para que cada capa se conecte a la siguiente capa a cada nodo en esa capa y donde están las conexiones son los pesos son lo que Ajústate con el algoritmo de entrenamiento oye, wayne, ¿cómo estás? um para que estas redes neuronales para que la neurona reciba la entrada, así que inicialmente es solo el valor de píxel y podría ser un montón de píxeles, no solo uno por neurona, podría recibir 10, por ejemplo, p neurona entonces tenemos los pesos que vienen en inicia lly son un valor de como la mitad y luego multiplicamos los pesos el peso por los valores de entrada, así que simplemente hacemos un poco de matemáticas allí, puede ver que estos eran los valores de escala de grises um que ahora están multiplicados por el peso no 0.5 para darnos un montón de valores, los sumamos todos juntos y luego tenemos que reducirlo usando algo llamado función sigmoidea que solo lo convierte en un valor entre 0 y 1 porque ese número es demasiado grande para lo que lo necesitamos y vimos cuál era la función sigmoidea en el otro video, así que no lo cubriré en este, por lo que el tipo de red neuronal que estamos usando es una que se llama mobilenet ssd, por lo que la detección de un solo disparo y antes cuando estamos hablando de clasificación de imágenes localización identificación esa identificación y um localización de los objetos en la escena piense en cómo lo haría usted, como programador, podría escanear cada píxel o un bloque de píxeles tal vez como una ventana en movimiento para ver si coincide más o menos este objeto que estamos buscando ahora si puede tener varios objetos en la escena, varias clases de cosas, es posible que tenga que hacer eso varias veces y pasar, y eso no sería muy rápido, por lo que la detección de un solo disparo esencialmente se mueve a través la red neuronal um una vez y um es capaz de detectar todas las clases y todas las instancias de un objeto en esa imagen en particular, por lo que es muy, muy rápido y muy eficiente al hacerlo, así que eso es lo que verá muy comúnmente si está usando redes neuronales para hacer este tipo de trabajo, así que mobilenet es lo que usaremos hoy también, así que si te gustan estos videos, recuerda darle un me gusta si estás viendo el video de um ahora en facebook o en youtube, dame un me gusta dame un pulgar hacia arriba um déjame un comentario también solo déjame saber lo que piensas sobre esto si lo has usado si tienes un jet ahora o estás pensando en conseguir uno o si tienes usaste el raspberry pi o has usado algo como un esp32, pueden hacer un montaje de procesamiento, creo, pero no lo he probado yo mismo todavía tengo el um lo traje conmigo la cámara um esp la cámara esp32 y tengo la intención de transmitir video desde eso y luego ejecutarlo a través del neural nvidia red para ver si puede hacer eso desde varias cámaras inalámbricas, eso será bastante bueno, así que sí, si te gustan estos videos, dame un pulgar hacia arriba y un comentario y uh, si no te has suscrito al canal, el canal, ¿qué estás? me estoy suscribiendo ahora, así que también tengo oops que se ve interesante, déjame uh, parece que he desaparecido allí, déjame regresar allí por un segundo, no he ajustado estas superposiciones en algún tiempo, así que creo que la cámara. Debe haber cambiado, así que si voy allí, soy fantástico, también lo haré en los otros dos, así que sí, hago un video todos los domingos, uh, a las siete en punto, hora media de Greenwich, creo que estamos en gmt más uno Creo que solo por un mes más y luego simplemente vuelve al granito mientras tanto, así que sí, yo ya sabes dónde vives en todo el mundo, así que démosle el siguiente.Creo que harán lo mismo, pero lo cambiaré rápidamente, así que déjame volver a cambiarme la cámara, así que ahí vamos, te lo mostraré. qué es esto para hacer eso, así que sí, echa un vistazo al sitio web smashfan.com um, ahí es donde puse todos los tutoriales.De hecho, he dado un poco de renovación desde que hice este pequeño clip aquí, así que probablemente necesite actualizar esta llamada a acción también, pero sí, recientemente actualicé el dispositivo de captura que uso solía usar estas pequeñas cosas de captura de video hdmi para traer el video hdmi de mi cámara y ahora estoy usando un elgato adecuado que es hd 60 plus, creo que se llama así que es un poco más nítido y puede hacer 60 cuadros por segundo y cuál fue el último que les iba a mostrar de nuevo, esto probablemente necesite ser arreglado, déjenme hacer eso también, soy increíble, así que sí, si quieren para apoyar el programa puedes ir a bamyaki.com y um slash kevin mccleary también puedes descargar um algunas de las cosas Tengo allí, así que de las mejores cartas de triunfo que creé en el video de la semana pasada, algunas personas las descargaron si aún no has tomado una copia de ellas, dirígete allí para hacer eso y eso puede ayudar a respaldar el programa y Pague por un equipo nuevo, así que estoy buscando una nueva cámara para mis cosas del techo, así que actualmente la cámara que tengo es esta, esta cámara aquí no es la mejor cámara , era una cámara USB barata y, sí, no lo es. la mejor cámara en el mundo por lo que eh si quieres ayudarnos apoyo el programa que puede hacer con sólo me va a comprar un café punto com barra diagonal de Kevin Maclean comprarme un café y que ayuda a apoyar el programa um, creo que eso es lo que quería cubrir ese tema, así que volvamos a nuestro discurso principal, habla sobre nuestro nuevo video, sí, hemos cubierto el entrenamiento correcto del modelo, así que aquí es donde ocurre la magia, así que vamos a entrenar el modelo, podemos volver a entrenar un modelo existente. modelo de hecho esto es lo que hicimos con el um el ssd móvil net mobile net ssd um we'v Tomamos un modelo existente que ya ha sido entrenado para detectar imágenes como personas y gatos y bolígrafos y hay bastantes objetos que puede detectar y esencialmente hemos agregado cuatro cosas a eso y luego lo hemos reentrenado. no toma tanto tiempo y se basa en ese aprendizaje que ya ha tenido lugar allí, así que ejecutamos el script ssd training.python, todos estos son script python, también hay versiones c de esto en el repositorio y echaremos un vistazo en eso en un minuto también, así que descubrí las cosas de la manera difícil, por lo que el archivo de etiquetas simplemente debe llamarse etiquetas.txt o no funciona correctamente, eso debe haber desperdiciado una hora de mi vida, creo que lo llamaría etiquetas inteligentes o algo etiqueta de clase o algo así y la ruta por la que pasa por donde está el modelo no debería tener ningún tipo de barra inclinada.Me estaba adelantando un poco allí y poniendo una barra inclinada hacia adelante, lo que significaba que directorio de ruta en lugar del directorio de trabajo actual que estaba en un ganancia que me arrojó durante probablemente otra media hora y toma mucho tiempo capturar mil imágenes de cada clase, así que me salté eso y probablemente solo hice un par de cientos um, por lo que más imágenes es una mejor calidad de objeto y si inclínelos hacia arriba, no funcionarán, um, solo están en línea recta o cuando estaban sentados en la mesa y el entrenamiento.Me sorprendió lo rápido que fue, así que creo que 30 épocas, que son como 30 sesiones de entrenamiento. tomó entre 30 minutos y 45 minutos um, fue bastante rápido de hacer, lo cual me impresionó, de acuerdo, así que sí, seguí el sitio para desarrolladores de jetson nano, tienen un tutorial que se llama hola ai world y tienen un montón de tutoriales allí, algunos videos de youtube agradables mejor que este explicado por las personas que han escrito el software por dusty envy, que es un repositorio de github que podemos ver allí y está construido sobre pi torch, que es una red neuronal basada en tensor art rt también y en el video incluido sus cosas porque obviamente está optimizado para su hardware, por lo que siguen todos los pasos diferentes , vamos a tener una fiesta en esto ahora, pero no voy a hacer todo el asunto porque lleva mucho tiempo pero solo voy a mostrarte una especie de muestra de cómo funciona esto en realidad, así que es el momento de la demostración, es mi momento favorito, así que déjame ir a esto, que es, simplemente muevo el mouse correcto, ahí vamos, así que estoy en mi jetson nano y tengo dos ventanas abiertas aquí, así que comencemos con esta ventana aquí, así que lo que voy a hacer es ejecutar el comando que es captura de cámara y simplemente voy a decir cámara captura y luego video zero es la cámara usb que tengo conectada a esto, así que tengo una cámara web aquí, así que lo que podría hacer es sostener varias cosas y me verás regresar a ellas Entonces, si recojo esto aquí, puedes ver en tiempo real , puedes ver en tiempo real que tengo acceso a estas cosas, así que si queremos, ¡Ups! Si queremos capturar ese bricolaje automático que está ahí, lo que haremos es ir a esta ventana de aquí, déjame alejar el mouse de esta ventana de aquí, déjame bajar esto un poco para que vayamos a detectarlo. y lo que podemos decir donde la ruta de datos van a ser las etiquetas de clase , simplemente elegiré las etiquetas de clase de allí que están en nuestra detección de entrenamiento de Python y carpeta ssd y luego dentro de eso tengo una carpeta de modelos uh de hecho permítanme saltar de nuevo a una carpeta allí carpeta de datos porque estamos creando los datos, tengo una carpeta inteligente y allí vamos, tags.txt, así que tags.txt significa que ahora puedo congelar esta imagen aquí, así que si voy a uh lo que me estoy perdiendo por ahí creo que es simplemente la trayectoria de detección permítanme uh poner esto en marcha por lo que está bien vamos a elegir esa carpeta hay que ir por lo que ahora puedo congelar la imagen por lo que si yo si estoy agitando mi mano aquí y luego congelo mi marco, va a congelar ese peso en mi mano ahora y en realidad no te muestra nada allí porque ause está congelado ese marco, pero ahora lo que puedo hacer es dibujar ese rectángulo alrededor del objeto que estamos interesados en detectar ahora necesitas estar bastante ajustado con estos límites cualquier cosa que hagas que no esté exactamente alrededor del objeto o si dibujas demasiados, eso tampoco es útil, solo quita ese um si sí, si no, oh, ahí vamos, eso es lo que estoy buscando, así que solo estoy buscando obtener ese pie y también la cabeza. una vez lo más apretado posible porque no queremos incluir ningún ruido adicional que solo haga que no sea una buena detección um, así que nos aseguraremos de que todas las diferentes partes de lo que estamos detectando estén dentro de ese límite caja, no importa si se superpone con otra, así que si esto um, si esto fuera como una superposición, simplemente voy a descongelar eso ahora, uh, puedes ver que está un poco oscurecido, déjame descongelar que puedes ver ese tipo de oscureciendo el auto allí está bien , todavía haríamos lo mismo que congelaríamos t y dibujaríamos alrededor de lo que podemos ver, por lo que el hecho de que todavía podamos ver algunos pedacitos allí está bien, no importa que esto se esté oscureciendo porque las imágenes posteriores lo erradicarán de la red neuronal, por lo que con el tiempo 'll darse cuenta de que eso no es un aspecto importante de lo que constituye un objeto de bricolaje automático por lo que si se puede ver en una pequeña ventana hay que no es muy fácil para mí para hacer un zoom en este déjame ver si realmente puede acercar um i don No creo que pueda porque es una captura de pantalla, así que en esta pequeña ventana aquí dice clase, así que puedo bajar allí y leer en ese cuadro de texto las etiquetas.txt y tengo autodiy es una clase diferente allí, puedes ver está cambiando el color del cuadro de límite, por lo que es púrpura para el uh smart mini, pero queremos auto bricolaje y luego tiene un ancho xaya y un alto, por lo que esa es solo la posición x, la posición y, el ancho y la altura y luego podemos eliminar que si es un error, pero lo que también podemos hacer, no estamos limitados a hacer solo uno Objeto, esta escena tiene varias cosas que nos interesan, así que podemos traerlas también para poder dibujar un rectángulo alrededor de esa. Puedo decir que es un quad inteligente que tenemos. una sonrisa inteligente, sí, una sonrisa regular, no estoy diferenciando entre eso y este tipo de sonrisas que están aquí abajo, solo voy a dibujar una alrededor de eso, así que esas son las sonrisas y luego tenemos en la parte inferior una mini inteligente, así que simplemente seleccionemos ese también para que pueda ver que se necesita un poco de tiempo para hacer esto y luego, una vez que lo haya hecho y lo haya guardado, tendrá que mover todo un poco dale un ángulo ligeramente diferente y luego hazlo todo de nuevo para que luego regreses, congelas el marco y luego dibujas alrededor de cada uno de estos nuevamente, así que lleva años hacerlo y tienes que tomar una decisión también. entonces, por ejemplo, debería hacer eso e incluir los cables, de lo contrario, lo que realmente constituye ese robot en particular que obj ect o es en realidad solo esa parte porque los cables no son algo que pueda ver en cada versión que ve, estos otros no tienen eso, pero um, incluiremos que simplemente mueva eso fuera del camino un segundo y solo traeré ese tipo de algo allí y luego empujaré ese solo para obtener la ventaja allí y también hay otro aquí, podríamos dibujar un cuadro alrededor de eso y luego pasaremos a la siguiente parte, así que está bien, así que cada uno de ellos ha sido asignado correctamente , no creo que este tenga aquí, así que ese es el que debe ser un mini inteligente, y así sucesivamente una vez que lo hayamos hecho. que tenemos, tendremos que contar aproximadamente cuántas imágenes hemos guardado si tenemos mil de cada tipo las tenemos desde todos los ángulos, así que acabo de agregar una allí, así que descongelar que sabes si obtenemos uno que sea como ese ángulo dependiendo de cómo esperamos que funcione si nuestros robots nunca van a ver un robot o En su lado, tal vez no necesitemos capturar eso, pero solo para tener una detección total y poder ver algo desde todos los ángulos, tal vez eso sea algo que también deberíamos hacer para esto y puede ver que dice el conjunto actual, por lo que ' tenemos tenemos un tren hemos validado y hemos probado y de hecho podemos fusionar todos los conjuntos solo por velocidad y tener los tres iguales, pero esa no es la mejor práctica, en realidad es la mejor práctica guardar la mitad de sus imágenes para el entrenamiento la mitad para la validación y luego la otra mitad tal vez un tercio para las pruebas, así que sí, la recopilación de datos, como dice la hipotética, es la parte que más tiempo consume, así que pasemos a lo siguiente, así que voy a cerrar eso. ventana allí que simplemente va a detener la ejecución de ese pequeño script de Python que se ha hecho todo en Python, puede ver allí , solo dice que se ha apagado por completo, ha estado usando algo llamado g streamer para traer el video y el video es muy, muy rápido, um como veremos ahora, cuando en realidad vamos a ejecutar um la versión terminada y les mostraré los pasos intermedios en un segundo, pero solo quiero llegar a la parte interesante de la demostración aquí, que es esta e iremos Una vez que hemos jugado con esto y hemos visto un poco más de detalle, un poco más de detalle sobre lo que está sucediendo allí, así que aparecerá una ventana en un segundo y detectará todos esos objetos en esa escena. y solo tarda unos 30 segundos en encenderse, así que ahí vamos, todo se ve bien.He escrito todas las cosas correctas porque lo probé antes de ponerlo en funcionamiento y luego se abrirá la ventana con todos los objetos en con todos los cuadros delimitadores a su alrededor, vamos, sabía que esto tomaría 24 segundos cuando estaba eh cuando estaba probando, vamos bien, así que puedes ver que no ha detectado la sonrisa que está de lado, pero en el segundo que lo volteo, detecta que sonríe correctamente, hagamos algunas cosas Es la escena, lo siento si el audio está entrando y saliendo. Solo intentaré tener eso allí . Voy a sacar todo esto de la escena y vamos a intentarlo . Intentemos una cosa a la vez, así que ahí está nuestro auto diy. está bien detectando que no se ve tan seguro como lo esperarías, sabes 71 algo por ciento, rotémoslo y veamos cómo va, es obvio que no he tomado ninguna captura de mi mano, así que está adivinando mi mano Es una sonrisa que es interesante, pero puede ver que mientras giramos esta ronda, um, está detectando con bastante confianza que ahí vamos, intentemos mover una pierna si hacemos un ángulo, sí, todavía está contento con eso y no había grabado ninguna imagen de en ese ángulo, así que eso es bastante interesante, dejemos ahora un robot inteligente para que detecte felizmente que hay un montón de imágenes inteligentes allí desde casi todos los ángulos, por eso es que detecte muy felizmente que no tomé tantas imágenes el mini inteligente así que ponlo ahí abajo está detectando i t y se puede ver allí podemos oscurecer ellos y es con seguridad su detección no es la detección de la auto tan bien no vamos a hacer eso sí, por lo que el segundo que va detrás de él no es tan seguros de que hay algo ahí para detectar una de las Los robots que no incluí en la captura de imagen era el weatherbot, así que voy a traer el weatherbot ahora, oops, acabo de destruir mi weatherbot allí si traigo weatherbots , simplemente lo ignora, no entiende que eso es algo que debería detectar, por lo que lo ignora por completo de nuestra escena, uy, de manera similar, este quadro inteligente, pero nunca se ha visto este quadrotor sonrisas en ninguno de los entrenamientos y supongo que es seguro que es un robot cuádruple, así que estoy muy impresionado con su capacidad para detectar that again if we bring in the other one that it has trained on it's very happy 98 that that's a quad robot from almost every single angle i was curious to see as well if you tilt this up does it continue to detect it you can see there that's it hasn't got any data to say that that's a quad robot so it's never detected it from that angle before but side on it's very very confident that it knows what that is okay so let's move that out of the way and then another smart robot has never seen this one before so let's see if it detects this as a smiles that's got the um the line sensor module on and it's still happily detecting that as a smart robot even though we never trained on that so detecting these objects these these features of it quite confidently let's bring in some other smiles robots bring in that one i think that's a quad that's interesting so it's never seen this one before oh it's now detecting that as uh some of these things that maybe was detecting that one behind actually let's move out the way so yeah detecting that as a quad because this feature is not something that it's seen before we take that off and it detects it as a smarts nope still detecting that as a quad it's now detecting it as a smiles now to be fair i didn't train it on that so it has no way of knowing that that particular part was something of a smiles robot so you can see how fast this is this is running very very fast in real time um very impressive how quickly it can do that so i mean i'm not saying we could use this in a you know a road traffic situation but it's uh certainly good enough for doing real-time object detection for us now if he was watching james bruton's video he did something he had a similar kind of setup and he detected triangle squares and circles and then he had his robot drive towards that um and he was using the position of the object he'd taken that out using the the python script and edited that so where the x and y is he could move the robot left or right to make that object more centered and one that was centered he would then drive towards it so that the width increased and when that got to a certain size it would then turn and then try and find the next object and he had it so rt of going in a circle detecting the circle the triangle and the square round and round around so i was thinking you know we could do something like that bringing in our where's my robot.com um this one has the wireless charger on the bottom so we could have a symbol that that's like power maybe two like um well maybe not two zigzags that's a probably not bring it from a historic point of view um maybe some kind of like power symbol and it can detect that and it can drive towards that and then actually as it gets closer um it it could slow down and be a bit more accurate and hop onto its little charging bay so that would be a really cool thing to do as well as just being able to detect its friends they could all swarm together and find each other and run away from enemy robots or something i don't know so yes i found this a really really fun thing to do um so let's get back over to the um nvidia and let's just stop this for a second and let me show you how we go about running the trai ning program so i've just got this connected by um hdmi input so i'm not doing any kind of screen share thing it's just this is why that's so fast i did look at using a vnc but it was very very slow and it you wouldn't have got the full flavor of just how fast this is at running okay so let me head over let me exit in fact this and i shall take you through one step at a time how we do this so let's just type exit okay so the first thing i did was i went to github and i cloned the repository the jetson inference repository and once i've got that on my machine i went into this folder here and i think there is a couple of scripts that are in there and one of the first scripts is docker so let's just cd into docker um apologies that this is so small and if there's anything we can do to in there we go zoom in let's zoom in a bit and we can all see what's going on then a bit easier okay do another one or two okay so there is a folder that's called docker and inside that docker folder there i s a bunch of scripts so there's a build script pull push run and tag and all i've done is run the run script so if i just jump back type in docker slash run dot sh docker not lockett if you've not come across docker before docker is a containerization technology so think about container ships got lots of containers on it and docker allows you to run lots of different pieces of software in separation within their own container and you can very quickly download containers update them distribute them you can do all kinds of clever stuff with them let me just type my password in properly here i changed it from the uh very simple password good grief there we go right so we're now running that docker instance so the container there you can see is called dusty nv jetson inference and then r32 6.1 and hypothetic says i do not grok docker the container is that the volume is a folder um which is that github folder that i downloaded but it actually has that as like the root of that particular con tainer so you can't see anything outside of that it's just brought in and then the other thing that it's brought in is the device that's called video zero which is the web camera that i've used now i have actually got um the raspberry pi um csi camera mcsi camera on there but that isn't working at the moment for some reason so i'm not sure what i'm doing wrong there but that's not working so i plugged in a usb camera that works fine so we're using that for now so now that i'm in that docker instance we can see a bunch of folders there so i'm just going to go into the build folder let's have a look what's in there there's a download models you can you can run that and that will tell you what let's run it and see actually so if we just um download models you can see there there's a whole bunch of different models google net google that's similar to mobilenet resnet alexnet inception and there's different ones for different purposes so these object detection ones as you'd expect are very good at detecting objects the inception one which is huge has got all kinds of objects in there all kinds of office and household objects um ped net multi-pad face net detect net you can see their dog bottle chair airplane mono depth pose estimation so if you've got um think about a um xbox 360 or xbox one charmage with a 360 and that had the the connect sensor on it and it could detect kind of what body pose you're in what your limbs were the pose estimation does that so um it can detect all different parts of your body and therefore what position they're in using that segmentation type stuff then there's semantic segmentation which is about cityscape so that's good for driving down the road and detecting street signs and pedestrians and all that kind of stuff very accurately so the whole bunch of them got some legacy ones in there as well image processing and that's it so if you click on any of them you can have this script download them for you so i had to play around with the fruit s one just to get familiar with it so then if we go into the um arc 64 which is the architecture of this we then go into the binaries folder and then there's a whole bunch of scripts in there so capture camera capture is what we used um in that window up there so when i run that up there that was actually camera capture and then the parameter that i was using was just simply slash dev slash video zero and video zero is the webcam um detect net is the the next program that we run that's the one that we ran in this window that it took that model that we'd built and then it ran that particular model so detect net is what we use there and then there's a few other pieces on there that um we probably don't need to look at so what we're going to do is come out of that oops maybe if i'm in the correct window let's just back out of that get back down to the first folder and then we're going to go into the python folder now so in this python folder there is a training folder so let's go into the training folder then it's got classification detection and segmentation we're going to go into the detection folder then there's the ssd model so we're going to go into the ssd folder and then finally in here there is a couple of scripts that we use to build and then test our model so these three folders that we're interested in this data models and vision so in the data folder that's where i stored my smart model so i've created a folder this is called smarts and then in there there is annotations image sets and jpeg images and then there's that labels.txt not let's get rid of that other one class we don't need that one so if we just have a look what's in labels.txt i just want cat to catalogue that we can see in there it simply says smart quad smart auto diy and smart mini that's all there is in that particular file so if we now go into jpeg images we can see that there's a whole load of images and in fact we can actually open that folder up is it which one is it for the full rubbis h using ubuntu so do you know i can't remember how you open up a folder in this thing file thingy there it was let's go for that i don't know if that's the right one file manager it looks good enough for me okay so if we go back to where we were looking there so jetson inference we're in python then we're in the training folder then was in the detection and the ssd then the data folder and then smiles and then jpeg images so let's have a look at one of these there we go awesome picture of some smiles roblox just hanging about there so you could see what i did there just took a picture and then moved to another just moved them around a bit um took another picture and so on but using that software um which was the um the camera capture software it creates for each one of those images an annotations file so if we have a look of this particular xml file here let me just move over a bit there you can see it says annotations file is the file name there the folder is smart the source is smart the annotation is custom the image is custom the size is 128. sorry one one two eight zero ninety seven twenty so it's 720p ish the depth is three it's rgb uh and there's no segmentation information in there so that one i actually hadn't classified um so let's just pick one a bit further down that has got a classification in it so this one um where are we on the database mars i was looking for the actual name let's try another one randomly down here there we go that's better so this has got a few different objects in it so um there's an object that's called smiles it hasn't got a pose so that's unspecified um and then the xyx max and x and y max so that the coordinates of the window wherever that is within that object within that image in fact for that object is specified there so this is what it generates um which is just raw data once we've got all those things done we can just jump back out of there and let's jump back again okay so then we've got the train ssd so that's what we wo uld actually use so if i just do python let's do python version we can see we're on version 3 3.69 so we're not not massively behind the times on that one what we are now 3.9 is it for python um so if we now do python and then train ssd dot pi now it does need to know lots of information you can't just run that it needs to know the model directory so that is smart even it's model smiles it needs to know the labels file hey adam so labels equals label so that's under model that's into data and that's under smart and that's under labels.txt what else does it need to know the data i think is that data i can't remember off the top ahead what this one is but you need to specify where all those images are so again that's under uh data slash mars uh let's just try that and see what happens i might have missed something else out there um it will tell me if there's an error or it definitely tells you if there's an error it's horrible um it takes a couple of minutes for it to sort of get started yes i've missed out it doesn't recognize labels equals so is it label singular let's just try that i'll give up if that doesn't work because it gets really really complicated but there's a whole bunch you can see there data sets um what else is the base directory scheduler epochs yeah we're not specified what the epochs are um so that's where it's best to go to which we'll do now actually let me load up github and i'll show you how we get more information about this so if i just go to there then i can share my screen there we go so dustin franklin is uh the person who created this repository for nvidia um and in there this jetson inference is where all the good stuff lives so um they've got all kinds of detail there image classification object detection semantic segmentation there's a better view of it there you can see right round each object it's entirely colored a particular thing you can see the street signs the trees are all separated segmented like so we've got pose estimation g ot some people there the limbs which way up they are um really cool stuff like that and simply what i did was just went through one of these how to set up your jetson and they've got all kinds of you know you can build it from you can use your docker container which is what i did and you just type in that three commands so we just clone the repository which is this repository we're looking at now we go into the folder and then we just run the document command run and it will grab everything it needs it sets everything up and then you're good to go um it then talks you through how to do all the other steps um such as training your model so you can find the bit on training there we go so it talks about transferred learning so the fact that we've run this through other objects before it already knows how to detect i say it knows the functions have been trained have been tweaked to the extent where they accept more information a lot easier so we don't have to bother with pi torch that's al ready installed by default whenever we do anything so we don't need to mess about with that and what else has it gone there just trying to see if there's a in fact there it was retraining your ssd mobile net so so the command that we are looking for the droids we are looking for um is around about here somewhere there we go so uh data model batch size and epochs right i'm going to keep that on one screen there and then i'm going to flick back to um let me see there now let me just see if i can get this correct so we so instead of it being model directory we don't need any of that we just need to say data equals data slash smart we then say the model directory so that was in there model dir equals models smarts helps if you type everything correctly and then batch size so this is how many images it processes at a time now i'm only on the two gig version of the um jetson nano and i stick to a batch size of two so two files at once and then the epoch is how many sessions of training it's going to run so this is going to be let's just do one for now now if i've typed everything correctly there um what's it not happy with what if i type wrong there models model directory that looks correct there's two dashes in there that's why that's not happy so let me just go back to that and then run it again i just wanted to get to the point where you can see that it's doing stuff right it doesn't know what the labels are that means so let's just tell it um where's the label thing is it labels equals data labels.txt i'll give up if that doesn't work you can have a play with it yep it's not happy with that labels argument i can't remember if it's label or label file or label i could be here all day without knowing that what that is um anyway so i can jump back to this other screen and show you on here so what it said so this is what you would say you would see a time stamp it would say epoch 0 and it would give a bunch of steps i think there's about 70 steps when i was doing this and then it would tell you what the loss is so it starts out as a really high number and then that would come down and you want that number to really to be under one ideally like 0.1 blah blah blah so the smallest possible number because that means you've got a really well trained uh network after it's done that it will then save the model out to um something like this so i'd give it a model name it'll say mb for mobilenet ssd for that single shot detection epoc which is the epoch number it'll tell you what the loss was uh and then it'll have this dot pth as the file extension and the more epochs you do it'll do one for each epoch and then when you try and convert that so we need to convert the model to be an open neural network exchange file so we just run this on x export we pointed at the particular model directory in theory that's what you should do i had to hack around with a little bit and it will look through each one of the epochs and find one with the lowest um lost number becaus e it might not be the last one you can actually overtrain these models and it will actually get worse at detecting things so that's there is an art sort of specifying how many you want to run then it could be between 30 and 300 depends just how how much time you're willing to throw at this okay so that's what we can do there's a whole bunch of other um test ones we can run in there there's like fruit there's airplanes there's toys there's all kind of stuff but i was really keen to detect my own objects to see just how easy this is to do and it's easy it's just time consuming and a bit fiddly um i've said it's for intermediate i don't think this is an expert skill to do this because you are just following kind of a script from someone um so it is quite straightforward to do that so that's why i've sort of said it's a an intermediate kind of skill okay so let's have a look at some of the comments we've got through here i've not been ignoring your promise so let's have a see what people h ave been talking about on on the the chat so and also let me just throw up um the over overlays because i've got a few overlays there that i was missing off okay right so we have quite a few people good grief on the stream today we've got 12 people on here um so carlos hey carlos how are you doing nice to have you on the stream today and richard you were saying every day is like christmas for me yes so ii bought um a whole bunch of things i want to show you so one of the first ones was um i was talking with adam um on last week's show about these uh little m5 stamps the pico stamps it's an esp32 uh in a little little form factor um so i've just been having a play with that i've not actually done anything with it yet but i had ordered that and that did come in the post and i've got some other things i wanted to show as well so one of the shows i would like to do soon is one with infrared i'm using infrared to control this one comes with a little remote control um control robot so we can sort of send and receive data to it and have it move about i mean i think this one is just to send only i don't think you can send and receive with that and also in the um small robots group somebody was having an issue with some nrf modules so i bought a pack of stuff let me just show you what's in here so in here we have i don't know if you can see that i can get that on there so there's a module in in there and there's got an antenna as well and then these also come with a little transmitter module receiver module that that plugs into and then there is also there is it a whole bunch of arduino nanos as well so there's essentially two arduino nanos and two sets of senders and receivers so i've got a bunch of them i'm gonna have a play with that and see if we can get a remote control smiles using nrf as well so that's why it's like christmas every day for me i think they were the things i was gonna show you i don't think there's any more just yet although there probably is i'm gettin g so much stuff every day hey d johnson how are you doing so hypothetically saying i've got um i have on each feather rp 2040 and the teen c4 running micropython 1.16 and i've seen one of those things running on circuit python awesome so i do believe you can use um a tiny ml um on raspberry pi pecos and the esp32s um i've got tnt as well i was looking at that before actually i think it's a version three i digress that i've just found my i don't know that's the team notes but i get distracted easily it's in there somewhere but i did find the um the camera the sp32 camera so this thing um it's not particularly easy to plug into you do need one of these uh ftdi things to to plug it in but this camera here um it's got an sd card on it as well and it's just an esp32 chip you can just see on the back there and you can stream apparently from these so i was thinking we could stream from this video camera this could be you know mounted onto the top of the smiles robot or something like that i'm sure kev thomas has done something like this already this can be driving about streaming its video but then you could use the power of another machine even like a desktop computer to process that through a neural network and uh do image detection and whatnot so uh hypothesis is also saying that use different backgrounds can also help during the capturing because that that noise there you do want that noise to be filtered out and that's a good way to do is just have a kind of a busy background as well so question for later regarding training would putting of an object on a turntable to change the angle help at all it would you would certainly get many angles by doing that i do actually have a turntable just behind that white thing sometimes you can see it there's a whole bunch of stuff just sat on it there it's that white thing that's just there that is the turntable i have so that would certainly help um but you do want to get it from as many angles as possible it might be a bit too u niform if the thing's you know pivoting around there you want it to be sort of a bit closer a bit further away because that will um the the closer an object gets the more distorted it gets on the camera and the further away it gets you know you can get that perspective effect on it so you do want these things from many different angles just so that the network can take that into account um possibly even not using the same camera though it probably would make sense to train on the camera that you're going to be using to detect stuff on there so yeah i would say turntables certainly certainly make it easier as you're capturing it so hey wayne how's it how's it going um and hypothetically it's not actually bad to have images where part is covered the a will get smarter that way absolutely it'll then learn to detect what is and isn't part of the object so it's quite good to do that collecting data is the most time consuming part of deep learning amen to that i would say um so data the data is mostly part of the pro process if you have garbage in you'll get garbage out you know i was even thinking about including that exact phrase on one of the slides i was thinking you know quality in quality out and um uh the hackanis87 says that is so cool it is so cool honestly i have so much fun playing with this um so yes be sure to take some um images of them falling down yes so if it's tipped over we want that um to be included in there as well and from every angle too so you just have to spend a whole load of time doing this so it needs much more data and in fact so one of the projects i've got on the back burner um that this will be perfect for is twitcher pi so it's on my github repository and the idea was if you've got a bird table you've got a raspberry pi zero pointing at that you can detect that there's birds on a bird table and you can classify them as being different types of bird and i went to the extent of downloading the top 20 birds from the rspca royal society for p rotection of birds and um you know what the most common english birds that you'll see in a in a garden and then have it classify each of them and i didn't have a bird table at the time or a camera set up outside to take these pictures so i simply went on the web and downloaded 100 of each type of bird and there was 20 birds so it was like 2 000 and odd pictures is it is that maths right um and yeah this took a very long time to do drawing the rectangles rather than classifying them and so on and then what was what worked well was if you give it a picture of a bird that was kind of like this from the side from the front doing some elaborate thing it would work fine but the back side of it no one takes pictures of the back sides of birds so of the avian variety certainly so because of that because of that there it only works when the birds are front on or side on or sort of taking off something like that um so that's one of the things i was thinking about maybe um working on a bit furthe r and using the sort of knowledge i've got from this as well i know how i could do that i was actually thinking about combining it to make it a bit more automatic so if i had um present sensor like using one of the infrared things i've got from the um one of the kits which is on that just there actually that's the pico starter kit in there there is an infrared detector um motion sensor that's the one upload i'm looking for a motion sensor so if i had a motion sensor set up to trigger to take some pictures i could do half the work which is taking pictures of the bird table while there were birds on it and then i'd only then have to go through and draw rectangles around a bunch of them so i was thinking about having some logic if it triggers it it would take a picture but then wait for two or three minutes before it does it again so that's one of the things i was thinking about there so hey tom how's it going so hypothetically says i grok doc ii do not grok docker grok is a unix command isn't it for uh finding things um adam was saying good evening how are you doing adam so you prefer the mate window manager instead of the stock one i hate that stock one i've got gotta say i much prefer the uh whatever they use on the raspberry pi i quite like that one um so is this on george george is that what we've called our ai i don't know have we given this a name i know we've got alf which is uh my other ai which we've been working on um maybe this should be called george from now on i think you've just named it there i'm going to have to rewatch this later because i'm getting distracted i get distracted and i'm the one doing the stream so yeah i'll watch it back too i think at the end and i need to transfer my pie camera to the jets and nano right so hi i have done this let me just set this camera up so i can show you what's going on here i've got a really messy desk as well i've got to apologize right so if i go for that there we go so this is my smart xxl and i've got i've j ust just stuck it on there for now it can actually go behind it in a little hole there but for now just stuck it on the front it's it it's configured the right way round but for the life of me i can't get this to actually work so i don't know what i'm doing wrong there apparently it should just pull that in but it's not working um that is a real pitta doll the real pitta to do though pitta is this some kind of gag that i'm not getting there um that was not the droid you're looking for absolutely not so i keep losing sync with the stream i'm going to jump out this watch later so you say out of sync that you're not following the conversational thread or is it um is it stuttering a bit because um i did i do make sure that i'm um yeah it looks healthy um i do check on the stats to make sure i'm connected by not wi-fi before i do these so does it work with other sources eg audio could you get it to detect bird calls absolutely so yeah you don't have to just do image detection there's all ki nds of audio detection in fact if you think about um our i've got one just there actually i've got an amazon one just there that's just pure speech recognition so there's a whole load of technology around wake words i thought that was just as simple as running sound through um a speech recognition engine but wake words like a separate thing in themselves because they've got to listen to all kind of background noise and then fire off the uh the speech recognition so yeah you could definitely do that um i've not looked at any of them yet and the jets and nano even though it's we think of it being graphical the the gpu that it's got there is just very good of crunching numbers very very quickly so it doesn't matter that it's a graphics processor it can do audio just as quick so still waiting on those prototype pcbs and the esp um esp cam has heating issues right so mine has got that mine gets really hot um this probably was an early version i might have ordered some other ones before this stream i remember i think i might have ordered a couple uh new ones so yeah garbage in garbage out definitely a recognized phrase rspb is that what i said did i say rb pay i have no idea royal society for the protection of birds not prevention of birds that's a different society oh george jetson of course that's why he's called george yes george jetson i follow you now i'll follow you now yeah so it's tiny compared to my flat right this is really not tidy this is uh really bad i've just got a mound of stuff here that i really don't want to show you too much more of but um yeah there's just a whole load of bits um that just needs tidying up and stuff putting away i mean this there's a failed version of um that smart xxl a blocked out in the 3d model like this is the size of the jetson nano and i actually left that 3d block in the design so when i started printing i was like why is it printing that out so yeah that was kind of uh stupid all right so i said rspca royal society for the pr evention of cruelty to animals uh yes yes it's rspb protection birds yeah so that's called the live adrenaline monster i think when you when you uh you're presenting and stuff a lot of uh stuff just falls out of your head so that's to do with that it happens cool okay so i hope you enjoyed this um this video today and how we can uh detect objects on there i'm just going to jump back to that and have a little play with that again if i can remember which button it is it is that button there and if i just wiggle my mouse and get rid of the screen saver we can hopefully just page up to get to the right one which should run in fact i'm gonna have to jump back to oh we want to be in the detection folder we want to be an ssd and then we want to find the command which just runs it which is the detect network so let me just find that clearly typed a lot of stuff in since then there we go we're not far off it now that was what was talking about first oh it's because i exited it didn't it though that's why um so detects net what is the thing for that is it model do you know what i can remember the whole command is that i need to write a script so i can just type go and it'll do it um because i'll forget what it is and you don't want to see me just badly typing in commands for the next half an hour so i'm just going to run that and have another play with it but that's fine you've seen it that's it if you don't make mistakes um it's not alive absolutely this isn't this isn't blue peter i prepared one here's one i prepared earlier blue peter was a kids program from the 80s and uh i think they still have it on actually on bbc and they always used to have this thing where they would they would have this really complicated build and then they would just pull out from another desk here's one i prepared earlier and it's like yeah you would like the production assistance that's way way production values are way way too high than is achievable by a child with some cardboard um can it de al with multiple inputs by the way so three streams from three different sources yes it can um i think the chip's really designed for just one lot of that and certainly the memory that two gig of memory gets used up pretty quick you get these low memory warnings in fact but yeah you can do that and it doesn't have to be a local camera either you can use a rtsp to bring the video across from something else which is why i was thinking about that esp camera wherever that's disappeared to you know i was thinking about bringing there it is bringing across video from that running it through the nvidia jetson and you know we can detect it from a humble smart that doesn't because this just needs five volts so this can just literally be stuck onto another robot it's broadcasting through its wi-fi signal so there's no processing going on here it's just sending out raw video and therefore we could then bring that in process it and then have that processor tell the robot what to do like go forward s go backwards and so on so if you already tried the neural network directly on the microcontroller with micro python so i've not yet tom um i have been looking through to try and find a good example of where that's um really practicable i know the raspberry pi pico has not got a lot of ram it's got like is it 256 maybe 300 and something k ram it's tiny so you have to crunch those uh mobile nets down even further i think you have to make them like an 8-bit version so that it can it's really really lightweight i have seen an example where somebody's um they've got a touch screen like an spi touch screen and you can scribble like a letter three for example and then it will detect that that's a three using um a common uh mobile net for detecting characters and there's only about 26 characters for it to grass so it's not a massive load on there and i don't believe it's too slow either it is okay at doing that so yeah i'm looking for a good example if you can find one just drop me a message and i'll look into that i'm definitely looking into seeing how how we could expand upon this and what else we could do with it which is saying uh have two smiles looking for each other exactly i was thinking like maybe them chasing each other but using image recognition so maybe they could have a little symbol on the back or maybe you could just get them to detect different parts of the robot so you can detect that that's the back of a smart rather than the front of us mars so we could certainly do that as well um do you see my message about the meaning of pita i probably missed that actually let me just scroll back um i don't know what you said about that ii saw you you said a pun but i missed what the meaning of that is so i didn't see that come up on the stream tom says that's why i use esp's absolutely because they got loads more they're a bit faster and they've got loads of memory and they've got the wi-fi as well so yeah i'm definitely sold on them cool cool okay so i think that 's everything i wanted to cover off on the show um i don't think there's anything else i was going to cover off and we have gone a little bit over there ah youtube probably blocked it it probably did yeah you don't want to get yourself banned on on youtube that wouldn't be fun at all um so i don't know i can't see that in there got you yep yep i have to be careful as well what i put on screen because uh if it detects that and it decides that that's like a foul language or something i can get like a takedown strike on that so that's fine cool okay so thanks for joining me on that one i hope you enjoyed it as much as i did um let's see where else this takes us this was just my first introduction to jetson and using the ai stuff that's on there the deep learning i'm really interested what your thoughts are where we should take this what we should do how we should build it into our small robots and bring them to life and make them more interesting so i shall see you next time hopefully we' ll do a midweek video on uh maybe one of these projects uh if not i shall see you uh for the stream on sunday next time thanks everybody for watching bye for now so so so you

Jetson Nano Custom Object Detection - how to train your own AI

Noticias relacionadas