AI Weekly Roundup: Exploring New Frontiers with AudioPaLM, RoboCat, and Voicebox
Alright, we've got some seriously exciting AI news that has been brewing.
First up, Stability AI has thrown a significant curveball in the image AI game. They've introduced SDXL 0.9, a text-to-image model suite that cooks up hyper-realistic images that might just make you do a double-take. And if you're a numbers guy, well, this one has one of the biggest parameter counts in the open-source image models sector, a whopping 3.5 billion. You can take a peek at it on the Clipdrop by Stability AI platform.
Switching gears, Google has graced us with AudioPaLM, a Large Language Model that doesn't just speak, it listens too. By cleverly merging the text-based PaLM-2 and the speech-based AudioLM models, it has sculpted a unified multimodal architecture that's churning out both text and speech like a boss.
Meanwhile, Google researchers are adding some Hollywood vibes to their AI with DreamHuman, a method that conjures up realistic and animated 3D human avatar models, with just textual descriptions as its muse.
Stepping up the game, Meta has unleashed the Voicebox, the first generative AI model for speech that performs tasks it wasn't specifically trained for. What's cooler? Like its siblings in image and text, Voicebox can create outputs from scratch and even tweak samples it’s been given to produce some crispy high-quality audio clips.
Over at Microsoft, they've launched Azure OpenAI Service for public preview. What's neat about it? You can run supported chat models like ChatGPT and GPT-4 on your connected data without breaking a sweat training or fine-tuning models.
Google Deepmind has also introduced its newest creation - RoboCat. This AI model is a veritable master of all trades for operating multiple robots, able to pick up new tasks like assembling structures or picking up objects with just a handful of demonstrations and some self-generated training data.
Now, you wouldn't believe this, but Wimbledon is now employing IBM Watsonx to produce AI-generated spoken commentary for this year's video highlights. And if you're a stats fan, their AI Draw Analysis is going to be your next favorite thing, leveraging IBM Power Index and Likelihood to Win predictions to assess each player’s potential journey to the final.
Dropbox too is stepping into the AI scene with Dropbox Dash and Dropbox AI. Imagine having a universal search bar powered by AI that connects all your tools, content, and apps. And, it doesn’t stop there. Dropbox AI can generate summaries and answers, not just from documents, but from videos too.
On the automotive side, Wayve is showcasing GAIA-1, an AI model that creates realistic driving videos using a cocktail of video, text, and action inputs, giving you the reins to fine-tune vehicle behavior and scene features.
Opera is raising the browser game with 'One', an AI chatbot-integrated browser. Meet ‘Aria’, who's on standby to enrich your content exploration by popping up from text highlights or right-clicks, as well as from the sidebar.
In the audio domain, ElevenLabs has rolled out ‘Projects’ for early access, a tool that lets you create an entire audiobook without stepping out of the platform. Not surprisingly, they've already surpassed 1 million registered users.
Vimeo is now offering new AI-powered video tools, such as a text-based video editor, a script generator, and an on-screen teleprompter.
For creators, Midjourney has introduced V5.2 with an array of features like zoom-out outpainting, improved aesthetics, text understanding, and sharper images. They've even included a new /shorten command to help you analyze your prompt tokens.
Parallel Domain is helping users build synthetic datasets with their new API, Data Lab, which uses generative AI.
If you've ever wanted to sell your customized AI models, OpenAI might have your back. They're considering setting up an App Store for this very purpose.
OpenLM Research is also stepping up their game with the 1T token version of OpenLLaMA 13B, an open-source reproduction of Meta AI's large language model.
On the hardware side, ByteDance, the creator of TikTok, has made quite the investment, ordering about $1 billion worth of Nvidia GPUs in 2023 alone, translating to around 100,000 units.
Lastly, we have the GPT-Engineer. You tell it what you want it to build, it'll ask you some clarifying questions, generate a technical spec, and then write all the necessary code.
In short, it's an exciting time in the world of AI, with these updates pushing the boundaries of what we thought was possible. Let's keep an eye on where they take us next!