Warp-Speed Wednesdays
Your Must-Read Weekly Tech Updates: AI Agents with Enhanced Memory, ChatGPT Controls Robots, Google & Tesla AI Models and Apple VisionPro Developer Tools Release
Click that web link here to get the full galactic experience.
Greetings after a brief hiatus!
My plate has been overflowing with commitments as I'm juggling full-time work, rounding off my Master's degree in Robotics, and immersing myself in an intriguing AI project I can't wait to reveal later this year.
My posts might be a tad sporadic until we hit August. However, I'm eager to bring you all up to speed with the whirlwind of activities that have been unfolding.
So, buckle up, and let's embark on this catch-up journey!
Autonomous AI Agents
The exploration of large language models (LLMs), notably GPT-4, is revealing that we're only at the tip of the iceberg regarding their potential. Engineers & researchers remain deep in their quest to understand these models' prowess, pushing boundaries with their capacity to explore their abilities to generate tools (code) and use tools (APIs) to complete complex tasks.
But you may be asking, what is an AI Agent?
An AI agent is a type of software program that acts autonomously within a specific environment to achieve a set of objectives or tasks. It's designed to perceive its environment, make decisions, and take actions based on its programming and the information it receives, ultimately aiming to maximize its chances of success or efficiency in accomplishing its tasks.
Watch this brief introduction video:
Here is a brief journey of the recent amazing findings.
ChatGPT for Robotics: Design Principles and Model Abilities
The Challenge with Current Robotics: The existing process for robotics involves a technical user or engineer manually translating task requirements into code and continuously adjusting it to correct the robot's behavior, which is a time-consuming, costly, and inefficient process that demands in-depth robotics knowledge.
How can LLMs Help: LLMs allows simple user instructions or high-level task goals to robot written in natural language (like English). The LLM develops steps to achieve the goal and then converts these steps into code for the robot to take action.
Here is one example of their open-source prompts and GPT-4’s instructions based on the goal in the prompt.
Updating and saving states allows LLMs to remember for long horizon tasks
Think of LLMs as the brain of a robot, helping it make sense of complex tasks. But these brains have a problem: they're a bit forgetful and can struggle to recall information from a while ago. This is a big issue for tasks that require a robot to remember a lot of details over a long time, like tidying up a house and remembering where everything goes.
Just like how we humans jot down important things in a diary so we don't forget, we propose a similar solution for these robot brains. Statler gives LLMs a kind of "diary" or "memory bank," which we refer to as a "world state." This "world state" is like a detailed, updated note of everything the robot has experienced and learned.
Statler works by using two versions of the robot brain — one to read and understand the world (world-model reader) and the other to write down and update what's happening in its world state diary (world-model writer). This memory bank helps the robot brains recall details from much further back, breaking free from their usual forgetfulness.
Here are some early Agent Metrics if you are developer working with LLMs:
Check the tweet in the link above to see more of my thoughts on improving LLM performance.
Robotics
RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation
Google’s RoboCat, a cutting-edge agent for vision-based robotic manipulation, can solve a vast array of tasks and adapt to new ones through a process called self-improvement. It improves with the scaling and diversification of its training data.
Extensive testing on real and simulated scenarios has demonstrated RoboCat's superior skill acquisition and adaptability compared to traditional models. It can adapt across robot configurations. It was trained on 5 and 7 DoF robots and only finetuned on 14 DoF robot.
RoboCat is a step forward in the general agency for robotics. This will have a huge impact on transforming society. With future self-improvement via RL, RoboCat could bring us one step closer to autonomous learning.
But we must ensure safety and have Humans supervise agency.
Barkour: Benchmarking Animal-level Agility with Quadruped Robots
Introducing Barkour: Google's dog-inspired agility benchmark for quadruped robots! This benchmark aims to measure and improve the agility of robots, using a diverse obstacle course based on dog agility shows.
“The robots are trained using a student-teacher framework and a Transformer-based policy that enables them to tackle different terrains and obstacles. In addition, a unique recovery policy helps robots to quickly recover from stumbles."
Barkour's scoring system measures robot agility by timing each obstacle, with scores ranging from 0 to 1. The full score represents a robot matching a small dog's 10-second course completion time at 1.7m/s, penalties apply for skipped or failed obstacles and slow speed.
Tested using custom-built robots, Barkour has demonstrated its effectiveness in creating robust, versatile and agile robotic movements. I'm excited to see how the field extends on this.
Google proves LLMs Can Define Reward Params to Optimize for Robots
Get ready for some robotic magic powered by LLMs! These incredible tech brainiacs are becoming the robot's best friend by expertly tuning their 'decision-making', boosting the efficiency of robotic tasks. Imagine them as the ultimate interpreters, turning complex language instructions into simple robotic actions. While the usual success rate hovers around 50%, LLMs are knocking it out of the park by acing an astonishing 90% of tasks.
Tesla is building the foundation models for autonomous robots (legged and wheeled).
See the complexity of the AI architecture that goes into FSD:
Tesla is also building the World’s largest computer to support their AI mission. It will be mix of their proprietary DOJO AI-specific chip and NVIDIA’s general-purpose chips. Tim Zaman, Lead of AI Infrastructure at Tesla and Twitter, explains why Tesla is focused on gaining more compute:
“For context, today, our compute clusters have 0.3% of idle time; and 84% of jobs are high priority. We want to be in a place where we have an excess of compute.”
Health Care
This ground-breaking technology utilizes eye scans to accurately forecast cardiovascular incidents. The advent of this technology could feasibly make CT scans, MRIs, and X-rays redundant, giving physicians an all-encompassing perspective of a patient's inner health. Pichai says that this breakthrough underscores the revolutionary impact AI can have on reshaping patient care.
“Healthcare is one of the most important fields AI is going to transform."
Pichai
This innovation opens doors to improved diagnostics and cost reductions. Let's embrace the future of healthcare!