Warp-Speed Wednesdays
Your Must-Read Weekly Tech Updates: Voyager Outperforms in Gaming, MegaByte's Architecture, Robotics Revolutionized by LLMs, and Brain-Spine Interface Restores Mobility, Create Your Own AI Avatar
Click that weblink here to get the full galactic experience.
AI
What makes Voyager an AI prodigy? It's the fact that this autonomous explorer harnesses the power of GPT-4 to continually explore its environment, master increasingly complex skills, and make consistent new discoveries — all without a single drop of human intervention.
In Minecraft, it has demonstrated superior prowess in unearthing unique items, unlocking the game's tech tree, traversing different terrains, and applying its learned skills to fresh challenges in a newly minted world.
Now, let's talk numbers. Remember Reflexion and React from our Hitchhiker’s Handbook Chapter II? Voyager has been leaving them in the stardust. It finds 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster.
What does this all mean? The fact that Voyager programmed its code to excel at Minecraft using iterative prompting carries massive implications. It demonstrates that GPT-4 and future improved LLMs can understand their environment and independently work out how to fulfill tasks.
This ability to self-navigate and master tasks represents a significant milestone in AI development. It suggests that these models do possess a world model and reasoning engine, laying the groundwork for embodied robots to evolve into general-purpose robots — think of this level of reasoning in a Tesla bot or another humanoid robot. I will later discuss the challenge that plagued the robotics industry for decades… but not anymore!
Voyager is not just a cool AI experiment; it's a stepping stone to creating more powerful and autonomous digital and physical agents without needing to fine-tune model parameters or use data-intensive RL.
Also, see similar results from OpenGVLab, Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory. 📽
And here’s where MegaByte’s new AI arichetecture may have some important impacts, as all patches in the sequence are processed and generated by the model at the same time.
In other words, although the order in which patchers are presented to the user follows a logical order, the necessary computations to decode every patch are done at the same time, severely increasing generation speed while reducing costs as GPU parallelization capabilities are maximized.
And as the generation is done byte-per-byte inside each patch, the size requirements for the local model are much smaller, which means that the total number of parameters in MegaByte is considerably smaller than other models like GPT-4 or LLaMa, while rivaling them in performance.
Therefore, as computational requirements are drastically reduced, we can considerably increase the sequence length to millions of tokens. We will see if this architecture lays fruit after some more research is conducted. Subscribe and I will keep you informed! Paper walkthrough here.
Robotics
How LLMs are going to aid Robotics — Agility Robotics Jumps In
Flashback: As we look back at pioneers like Honda's Asimo, we admire their ambition, yet acknowledge their limitations — a high price tag (est. $2.5mil/bot) and a lack of adaptability, compounded by a lack of data to embrace AI and Machine Learning.
Envision this: You're perched on the brink of a colossal paradigm shift, where robots evolve from mere manufacturing tools to capable world agents. We're not orbiting around a far-flung fantasy, but the imminent dawn of a new era. By wielding the might of Language Learning Models (LLMs) such as the used with Voyager, we're cracking the cryptic codes that have hindered the robotic metamorphosis for eons.
Agility Robotics LLM Showcase: Today, we witness another step forward in Robotics as Agility's Digit emerges, linking speech with meaningful action, a living testament to the potential of LLMs. This is a simple but deeply powerful demo. Each passing moment brings new advancements, drawing us ever closer to a robotic future that was once the stuff of dreams.
Announcing Figure's $70M Series A to strategic use to revolutionize the future of robotics.
Their agenda? Propel their robot development at breakneck speed, funnel resources into manufacturing, architect a comprehensive AI data engine, and accelerate commercialization.
As for their progress? They've successfully integrated and tested their first-generation humanoid robot, and they're already elbows-deep in designing Gen 2.
Remember: They teamed up with OpenAI. So, if you thought Agility's Digit, with its LLM, was groundbreaking, prepare to have your mind blown when Figure unveils their bot.
Medical - Brain-Computer Interfaces
Journey to Recovery: The Digital Bridge Restoring Mobility in Paralysis
In a groundbreaking development, these researchers and their patient succeeded in creating a 'digital bridge' that restores this crucial line of communication, enabling an individual with chronic tetraplegia to stand and walk freely in everyday settings.
Full video here: 📽
This revolutionary brain-spine interface (BSI) uses fully embedded recording and stimulation systems to establish a direct link between the brain's signals and electrical stimulation targeted at the spinal cord regions responsible for walking. Remarkably, this reliable BSI can be calibrated in just a few minutes, and has proven to be stable over a year, even during independent home use.
The individual utilizing the BSI reports a natural control over leg movements, allowing him to stand, walk, climb stairs, and navigate complex terrains. Even more exciting, the neurorehabilitation facilitated by the BSI has contributed to neurological recovery - the participant has regained the ability to walk overground with crutches, even when the BSI is turned off.
These promising results establish a blueprint for restoring natural movement control after paralysis, offering a beacon of hope for countless individuals worldwide.
Neuralink approved for human clinical trials.
Neuralink company has faced challenges and public and FDA scrutiny. They were recently in the headlines for animal cruelty. And even Elon was rumored to be frustrated with the team’s lack of progress. This is an exciting milestone for this company, and I am glad the road is arduous for them. It brings me some faith in the FDA working to support Human safety in the face of innovation.
As we conclude, it's important to acknowledge that the notion of augmenting our bodies, particularly our invaluable brains, can be understandably daunting. We are venturing into territory that often feels like it belongs to the pages of science fiction. However, when viewed from the perspective of restoring normality to lives that have been disrupted by debilitating conditions, the moral implications take on a different tone.
The key objective here isn't to create a superhuman race of cyborgs, but to leverage the profound potential of technology to enhance the quality of human life. Helping people regain mobility, autonomy, and a sense of normalcy isn't just a laudable goal - it's a fundamental driving force behind technological innovation. It's about broadening horizons, pushing boundaries, and above all, transforming lives.
The advent of technologies such as the brain-spine interface underscores the remarkable era we live in, where we can bridge gaps that were once considered insurmountable.
How I Made My Latest IG Video Using AI!!!
Video: 📽
Tools
ChatGPT - Develops the draft script, transition points, and captions
ElevenLabs - Records and generates my voice to read the script
D-ID - Digital Humans Video Generator - Generate talking AI avatar video
Google’s MusicLM - Create the background music
Canva - Stitch the video together
SnagIt - Record video clips from online
Process
Note: ChatGPT is a brainstorming, draft, and final editing tool. I NEVER trust GPTs answers. I tell it the facts. I created the story myself arc myself. It took me 4 hours to go from the idea to the finished product since this was my first time doing a reel of this style and using ElevenLabs, D-ID, and MusicLM. I expect to get this down to an hour with the same, if not better, quality.
Determine the high-level story arc for your video and gather notes for the story.
Prompt ChatGPT (use GPT-4 for the best results) with instructions on what role and response you want from in. See my prompt here.
Note: You should be doing this with GPT for all tasks, because research has shown that all LLMs, especially GPT-4, perform better once it knows their role. These are guides for the LLM. Here is an example Andrej Karpathy shared.
Paste your notes into ChatGPT.
Once GPT has finished the first version, you can further direct it to achieve exactly what you want.
📽 Shows how to create your own AI avatar with MidJourney, D-ID, and ElevenLabs.
Google’s MusicLM is straight forward. Prompt it on the music you want for the background.
Finally, put it all in Canva, or any other video creating app, and have fun.
Excited to see what you create. Stay creative, Fellow Hitchhikers.