ThursdAI - The top AI news from the past week

By: From Weights & Biases Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week
  • Summary

  • Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.

    sub.thursdai.news
    Alex Volkov
    Show More Show Less
activate_Holiday_promo_in_buybox_DT_T2
Episodes
  • 📆 ThursdAI - Spooky Halloween edition with Video!
    Nov 1 2024
    Hey everyone, Happy Halloween! Alex here, coming to you live from my mad scientist lair! For the first ever, live video stream of ThursdAI, I dressed up as a mad scientist and had my co-host, Fester the AI powered Skeleton join me (as well as my usual cohosts haha) in a very energetic and hopefully entertaining video stream! Since it's Halloween today, Fester (and I) have a very busy schedule, so no super length ThursdAI news-letter today, as we're still not in the realm of Gemini being able to write a decent draft that takes everything we talked about and cover all the breaking news, I'm afraid I will have to wish you a Happy Halloween and ask that you watch/listen to the episode. The TL;DR and show links from today, don't cover all the breaking news but the major things we saw today (and caught live on the show as Breaking News) were, ChatGPT now has search, Gemini has grounded search as well (seems like the release something before Google announces it streak from OpenAI continues). Here's a quick trailer of the major things that happened: This weeks buzz - Halloween AI toy with WeaveIn this weeks buzz, my long awaited Halloween project is finally live and operational! I've posted a public Weave dashboard here and the code (that you can run on your mac!) hereReally looking forward to see all the amazing costumers the kiddos come up with and how Gemini will be able to respond to them, follow along! ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Ok and finally my raw TL;DR notes and links for this week. Happy halloween everyone, I'm running off to spook the kiddos (and of course record and post about it!)ThursdAI - Oct 31 - TL;DRTL;DR of all topics covered:* Open Source LLMs:* Microsoft's OmniParser: SOTA UI parsing (MIT Licensed) 𝕏* Groundbreaking model for web automation (MIT license).* State-of-the-art UI parsing and understanding.* Outperforms GPT-4V in parsing web UI.* Designed for web automation tasks.* Can be integrated into various development workflows.* ZhipuAI's GLM-4-Voice: End-to-end Chinese/English speech 𝕏* End-to-end voice model for Chinese and English speech.* Open-sourced and readily available.* Focuses on direct speech understanding and generation.* Potential applications in various speech-related tasks.* Meta releases LongVU: Video LM for long videos 𝕏* Handles long videos with impressive performance.* Uses DINOv2 for downsampling, eliminating redundant scenes.* Fuses features using DINOv2 and SigLIP.* Select tokens are passed to Qwen2/Llama-3.2-3B.* Demo and model are available on HuggingFace.* Potential for significant advancements in video understanding.* OpenAI new factuality benchmark (Blog, Github)* Introducing SimpleQA: new factuality benchmark* Goal: high correctness, diversity, challenging for frontier models* Question Curation: AI trainers, verified by second trainer* Quality Assurance: 3% inherent error rate* Topic Diversity: wide range of topics* Grading Methodology: "correct", "incorrect", "not attempted"* Model Comparison: smaller models answer fewer correctly* Calibration Measurement: larger models more calibrated* Limitations: only for short, fact-seeking queries* Conclusion: drive research on trustworthy AI* Big CO LLMs + APIs:* ChatGPT now has Search! (X)* Grounded search results in browsing the web* Still hallucinates* Reincarnation of Search GPT inside ChatGPT* Apple Intelligence Launch: Image features for iOS 18.2 [𝕏]( Link not provided in source material)* Officially launched for developers in iOS 18.2.* Includes Image Playground and Gen Moji.* Aims to enhance image creation and manipulation on iPhones.* GitHub Universe AI News: Co-pilot expands, new Spark tool 𝕏* GitHub Co-pilot now supports Claude, Gemini, and OpenAI models.* GitHub Spark: Create micro-apps using natural language.* Expanding the capabilities of AI-powered coding tools.* Copilot now supports multi-file edits in VS Code, similar to Cursor, and faster code reviews.* GitHub Copilot extensions are planned for release in 2025.* Grok Vision: Image understanding now in Grok 𝕏* Finally has vision capabilities (currently via 𝕏, API coming soon).* Can now understand and explain images, even jokes.* Early version, with rapid improvements expected.* OpenAI advanced voice mode updates (X)* 70% cheaper in input tokens because of automatic caching (X)* Advanced voice mode is now on desktop app* Claude this morning - new mac / pc App* This week's Buzz:* My AI Halloween toy skeleton is greeting kids right now (and is reporting to Weave dashboard)* Vision & Video:* Meta's LongVU: Video LM for long videos 𝕏 (see Open Source LLMs for details)* Grok Vision on 𝕏: 𝕏 (see Big CO LLMs + APIs for details)* Voice & Audio:* MaskGCT: New SoTA Text-to-Speech 𝕏* New open-source state-of-the-art text-to-speech model.* Zero-shot voice cloning, emotional TTS, ...
    Show More Show Less
    1 hr and 49 mins
  • 📅 ThursdAI - Oct 24 - Claude 3.5 controls your PC?! Talking AIs with 🦾, Multimodal Weave, Video Models mania + more AI news from this 🔥 week.
    Oct 25 2024
    Hey all, Alex here, coming to you from the (surprisingly) sunny Seattle, with just a mind-boggling week of releases. Really, just on Tuesday there was so much news already! I had to post a recap thread, something I do usually after I finish ThursdAI! From Anthropic reclaiming close-second sometimes-first AI lab position + giving Claude the wheel in the form of computer use powers, to more than 3 AI video generation updates with open source ones, to Apple updating Apple Intelligence beta, it's honestly been very hard to keep up, and again, this is literally part of my job! But once again I'm glad that we were able to cover this in ~2hrs, including multiple interviews with returning co-hosts ( Simon Willison came back, Killian came back) so definitely if you're only a reader at this point, listen to the show! Ok as always (recently) the TL;DR and show notes at the bottom (I'm trying to get you to scroll through ha, is it working?) so grab a bucket of popcorn, let's dive in 👇 ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Claude's Big Week: Computer Control, Code Wizardry, and the Mysterious Case of the Missing OpusAnthropic dominated the headlines this week with a flurry of updates and announcements. Let's start with the new Claude Sonnet 3.5 (really, they didn't update the version number, it's still 3.5 tho a different API model) Claude Sonnet 3.5: Coding Prodigy or Benchmark Buster?The new Sonnet model shows impressive results on coding benchmarks, surpassing even OpenAI's O1 preview on some. "It absolutely crushes coding benchmarks like Aider and Swe-bench verified," I exclaimed on the show. But a closer look reveals a more nuanced picture. Mixed results on other benchmarks indicate that Sonnet 3.5 might not be the universal champion some anticipated. My friend who has held back internal benchmarks was disappointed highlighting weaknesses in scientific reasoning and certain writing tasks. Some folks are seeing it being lazy-er for some full code completion, while the context window is now doubled from 4K to 8K! This goes to show again, that benchmarks don't tell the full story, so we wait for LMArena (formerly LMSys Arena) and the vibe checks from across the community. However it absolutely dominates in code tasks, that much is clear already. This is a screenshot of the new model on Aider code editing benchmark, a fairly reliable way to judge models code output, they also have a code refactoring benchmarkHaiku 3.5 and the Vanishing Opus: Anthropic's Cryptic CluesFurther adding to the intrigue, Anthropic announced Claude 3.5 Haiku! They usually provide immediate access, but Haiku remains elusive, saying that it's available by end of the month, which is very very soon. Making things even more curious, their highly anticipated Opus model has seemingly vanished from their website. "They've gone completely silent on 3.5 Opus," Simon Willison (𝕏) noted, mentioning conspiracy theories that this new Sonnet might simply be a rebranded Opus? 🕯️ 🕯️ We'll make a summoning circle for new Opus and update you once it lands (maybe next year) Claude Takes Control (Sort Of): Computer Use API and the Dawn of AI Agents (𝕏)The biggest bombshell this week? Anthropic's Computer Use. This isn't just about executing code; it’s about Claude interacting with computers, clicking buttons, browsing the web, and yes, even ordering pizza! Killian Lukas (𝕏), creator of Open Interpreter, returned to ThursdAI to discuss this groundbreaking development. "This stuff of computer use…it’s the same argument for having humanoid robots, the web is human shaped, and we need AIs to interact with computers and the web the way humans do" Killian explained, illuminating the potential for bridging the digital and physical worlds. Simon, though enthusiastic, provided a dose of realism: "It's incredibly impressive…but also very much a V1, beta.” Having tackled the setup myself, I agree; the current reliance on a local Docker container and virtual machine introduces some complexity and security considerations. However, seeing Claude fix its own Docker installation error was an unforgettably mindblowing experience. The future of AI agents is upon us, even if it’s still a bit rough around the edges.Here's an easy guide to set it up yourself, takes 5 minutes, requires no coding skills and it's safely tucked away in a container.Big Tech's AI Moves: Apple Embraces ChatGPT, X.ai API (+Vision!?), and Cohere Multimodal EmbeddingsThe rest of the AI world wasn’t standing still. Apple made a surprising integration, while X.ai and Cohere pushed their platforms forward.Apple iOS 18.2 Beta: Siri Phones a Friend (ChatGPT)Apple, always cautious, surprisingly integrated ChatGPT directly into iOS. While Siri remains…well, Siri, users can now effortlessly offload more demanding tasks to ChatGPT. "Siri ...
    Show More Show Less
    1 hr and 56 mins
  • 📆 ThursdAI - Oct 17 - Robots, Rockets, and Multi Modal Mania with open source voice cloning, OpenAI new voice API and more AI news
    Oct 18 2024
    Hey folks, Alex here from Weights & Biases, and this week has been absolutely bonkers. From robots walking among us to rockets landing on chopsticks (well, almost), the future is feeling palpably closer. And if real-world robots and reusable spaceship boosters weren't enough, the open-source AI community has been cooking, dropping new models and techniques faster than a Starship launch. So buckle up, grab your space helmet and noise-canceling headphones (we’ll get to why those are important!), and let's blast off into this week’s AI adventures!TL;DR and show-notes + links at the end of the post 👇Robots and Rockets: A Glimpse into the FutureI gotta start with the real-world stuff because, let's be honest, it's mind-blowing. We had Robert Scoble (yes, the Robert Scoble) join us after attending the Tesla We, Robot AI event, reporting on Optimus robots strolling through crowds, serving drinks, and generally being ridiculously futuristic. Autonomous robo-taxis were also cruising around, giving us a taste of a driverless future.Robert’s enthusiasm was infectious: "It was a vision of the future, and from that standpoint, it succeeded wonderfully." I couldn't agree more. While the market might have had a mini-meltdown (apparently investors aren't ready for robot butlers yet), the sheer audacity of Tesla’s vision is exhilarating. These robots aren't just cool gadgets; they represent a fundamental shift in how we interact with technology and the world around us. And they’re learning fast. Just days after the event, Tesla released a video of Optimus operating autonomously, showcasing the rapid progress they’re making.And speaking of audacious visions, SpaceX decided to one-up everyone (including themselves) by launching Starship and catching the booster with Mechazilla – their giant robotic chopsticks (okay, technically a launch tower, but you get the picture). Waking up early with my daughter to watch this live was pure magic. As Ryan Carson put it, "It was magical watching this… my kid who's 16… all of his friends are getting their imaginations lit by this experience." That’s exactly what we need - more imagination and less doomerism! The future is coming whether we like it or not, and I, for one, am excited.Open Source LLMs and Tools: The Community Delivers (Again!)Okay, back to the virtual world (for now). This week's open-source scene was electric, with new model releases and tools that have everyone buzzing (and benchmarking like crazy!).* Nemotron 70B: Hype vs. Reality: NVIDIA dropped their Nemotron 70B instruct model, claiming impressive scores on certain benchmarks (Arena Hard, AlpacaEval), even suggesting it outperforms GPT-4 and Claude 3.5. As always, we take these claims with a grain of salt (remember Reflection?), and our resident expert, Nisten, was quick to run his own tests. The verdict? Nemotron is good, "a pretty good model to use," but maybe not the giant-killer some hyped it up to be. Still, kudos to NVIDIA for pushing the open-source boundaries. (Hugging Face, Harrison Kingsley evals)* Zamba 2 : Hybrid Vigor: Zyphra, in collaboration with NVIDIA, released Zamba 2, a hybrid Sparse Mixture of Experts (SME) model. We had Paolo Glorioso, a researcher from Ziphra, join us to break down this unique architecture, which combines the strengths of transformers and state space models (SSMs). He highlighted the memory and latency advantages of SSMs, especially for on-device applications. Definitely worth checking out if you’re interested in transformer alternatives and efficient inference.* Zyda 2: Data is King (and Queen): Alongside Zamba 2, Zyphra also dropped Zyda 2, a massive 5 trillion token dataset, filtered, deduplicated, and ready for LLM training. This kind of open-source data release is a huge boon to the community, fueling the next generation of models. (X)* Ministral: Pocket-Sized Power: On the one-year anniversary of the iconic Mistral 7B release, Mistral announced two new smaller models – Ministral 3B and 8B. Designed for on-device inference, these models are impressive, but as always, Qwen looms large. While Mistral didn’t include Qwen in their comparisons, early tests suggest Qwen’s smaller models still hold their own. One point of contention: these Ministrals aren't as open-source as the original 7B, which is a bit of a bummer, with the 3B not being even released anywhere besides their platform. (Mistral Blog)* Entropix (aka Shrek Sampler): Thinking Outside the (Sample) Box: This one is intriguing! Entropix introduces a novel sampling technique aimed at boosting the reasoning capabilities of smaller LLMs. Nisten’s yogurt analogy explains it best: it’s about “marinating” the information and picking the best “flavor” (token) at the end. Early examples look promising, suggesting Entropix could help smaller models tackle problems that even trip up their larger counterparts. But, as with all shiny new AI toys, we're eagerly awaiting robust evals. Tim...
    Show More Show Less
    1 hr and 35 mins

What listeners say about ThursdAI - The top AI news from the past week

Average customer ratings

Reviews - Please select the tabs below to change the source of reviews.