• 📆 ThursdAI - Jan 9th - NVIDIA's Tiny Supercomputer, Phi-4 is back, Kokoro TTS & Moondream gaze, ByteDance SOTA lip sync & more AI news
    Jan 10 2025
    Hey everyone, Alex here 👋This week's ThursdAI was a whirlwind of announcements, from Microsoft finally dropping Phi-4's official weights on Hugging Face (a month late, but who's counting?) to Sam Altman casually mentioning that OpenAI's got AGI in the bag and is now setting its sights on superintelligence. Oh, and NVIDIA? They're casually releasing a $3,000 supercomputer that can run 200B parameter models on your desktop. No big deal.We had some amazing guests this week too, with Oliver joining us to talk about a new foundation model in genomics and biosurveillance (yes, you read that right - think wastewater and pandemic monitoring!), and then, we've got some breaking news! Vik returned to the show with a brand new Moondream release that can do some pretty wild things. Ever wanted an AI to tell you where someone's looking in a photo? Now you can, thanks to a tiny model that runs on edge devices. 🤯So buckle up, folks, because we've got a ton to cover. Let's dive into the juicy details of this week's AI madness, starting with open source.03:10 TL;DR03:10 Deep Dive into Open Source LLMs10:58 MetaGene: A New Frontier in AI20:21 PHI4: The Latest in Open Source AI27:46 R Star Math: Revolutionizing Small LLMs34:02 Big Companies and AI Innovations42:25 NVIDIA's Groundbreaking Announcements43:49 AI Hardware: Building and Comparing Systems46:06 NVIDIA's New AI Models: LLAMA Neumatron47:57 Breaking News: Moondream's Latest Release50:19 Moondream's Journey and Capabilities58:41 Weights & Biases: New Evals Course01:08:29 NVIDIA's World Foundation Models01:08:29 ByteDance's LatentSync: State-of-the-Art Lip Sync01:12:54 Kokoro TTS: High-Quality Text-to-SpeechAs always, TL;DR section with links and show notes below 👇Open Source AI & LLMsPhi-4: Microsoft's "Small" Model Finally Gets its Official Hugging Face DebutFinally, after a month, we're getting Phi-4 14B on HugginFace. So far, we've had bootlegged copies of it, but it's finally officially uploaded by Microsoft. Not only is it now official, it's also officialy MIT licensed which is great!So, what's the big deal? Well, besides the licensing, it's a 14B parameter, dense decoder-only Transformer with a 16K token context length and trained on a whopping 9.8 trillion tokens. It scored 80.4 on math and 80.6 on MMLU, making it about 10% better than its predecessor, Phi-3 and better than Qwen 2.5's 79What’s interesting about phi-4 is that the training data consisted of 40% synthetic data (almost half!)The vibes are always interesting with Phi models, so we'll keep an eye out, notable also, the base models weren't released due to "safety issues" and that this model was not trained for multi turn chat applications but single turn use-casesMetaGene-1: AI for Pandemic Monitoring and Pathogen DetectionNow, this one's a bit different. We usually talk about LLMs in this section, but this is more about the "open source" than the "LLM." Prime Intellect, along with folks from USC, released MetaGene-1, a metagenomic foundation model. That's a mouthful, right? Thankfully, we had Oliver Liu, a PhD student at USC, and an author on this paper, join us to explain.Oliver clarified that the goal is to use AI for "biosurveillance, pandemic monitoring, and pathogen detection." They trained a 7B parameter model on 1.5 trillion base pairs of DNA and RNA sequences from wastewater, creating a model surprisingly capable of zero-shot embedding. Oliver pointed out that while using genomics to pretrain foundation models is not new, MetaGene-1 is, "in its current state, the largest model out there" and is "one of the few decoder only models that are being used". They also have collected 15T bae pairs but trained on 10% of them due to grant and compute constraints.I really liked this one, and though the science behind this was complex, I couldn't help but get excited about the potential of transformer models catching or helping catch the next COVID 👏rStar-Math: Making Small LLMs Math Whizzes with Monte Carlo Tree SearchAlright, this one blew my mind. A paper from Microsoft (yeah, them again) called "rStar-Math" basically found a way to make small LLMs do math better than o1 using Monte Carlo Tree Search (MCTS). I know, I know, it sounds wild. They took models like Phi-3-mini (a tiny 3.8B parameter model) and Qwen 2.5 3B and 7B, slapped some MCTS magic on top, and suddenly these models are acing the AIME 2024 competition math benchmark and scoring 90% on general math problems. For comparison, OpenAI's o1-preview scores 85.5% on math and o1-mini scores 90%. This is WILD, as just 5 months ago, it was unimaginable that any LLM can solve math of this complexity, then reasoning models could, and now small LLMs with some MCTS can!Even crazier, they observed an "emergence of intrinsic self-reflection capability" in these models during problem-solving, something they weren't designed to do. LDJ chimed in saying "we're going to see more papers showing these things emerging and caught naturally." ...
    Show More Show Less
    1 hr and 20 mins
  • 📆 ThursdAI - Jan 2 - is 25' the year of AI agents?
    Jan 2 2025
    Hey folks, Alex here 👋 Happy new year!On our first episode of this year, and the second quarter of this century, there wasn't a lot of AI news to report on (most AI labs were on a well deserved break). So this week, I'm very happy to present a special ThursdAI episode, an interview with Joāo Moura, CEO of Crew.ai all about AI agents!We first chatted with Joāo a year ago, back in January of 2024, as CrewAI was blowing up but still just an open source project, it got to be the number 1 trending project on Github, and #1 project on Product Hunt. (You can either listen to the podcast or watch it in the embedded Youtube above)00:36 Introduction and New Year Greetings02:23 Updates on Open Source and LLMs03:25 Deep Dive: AI Agents and Reasoning03:55 Quick TLDR and Recent Developments04:04 Medical LLMs and Modern BERT09:55 Enterprise AI and Crew AI Introduction10:17 Interview with João Moura: Crew AI25:43 Human-in-the-Loop and Agent Evaluation33:17 Evaluating AI Agents and LLMs44:48 Open Source Models and Fin to OpenAI45:21 Performance of Claude's Sonnet 3.548:01 Different parts of an agent topology, brain, memory, tools, caching53:48 Tool Use and Integrations01:04:20 Removing LangChain from Crew01:07:51 The Year of Agents and Reasoning01:18:43 Addressing Concerns About AI01:24:31 Future of AI and Agents01:28:46 Conclusion and Farewell---Is 2025 "the year of AI agents"?AI agents as I remember them as a concept started for me a few month after I started ThursdAI ,when AutoGPT exploded. Was such a novel idea at the time, run LLM requests in a loop,(In fact, back then, I came up with a retry with AI concept and called it TrAI/Catch, where upon an error, I would feed that error back into the GPT api and ask it to correct itself. it feels so long ago!)AutoGPT became the fastest ever Github project to reach 100K stars, and while exciting, it did not work.Since then we saw multiple attempts at agentic frameworks, like babyAGI, autoGen. Crew AI was one of them that keeps being the favorite among many folks.So, what is an AI agent? Simon Willison, friend of the pod, has a mission, to ask everyone who announces a new agent, what they mean when they say it because it seems that everyone "shares" a common understanding of AI agents, but it's different for everyone.We'll start with Joāo's explanation and go from there. But let's assume the basic, it's a set of LLM calls, running in a self correcting loop, with access to planning, external tools (via function calling) and a memory or sorts that make decisions.Though, as we go into detail, you'll see that since the very basic "run LLM in the loop" days, the agents in 2025 have evolved and have a lot of complexity.My takeaways from the conversationI encourage you to listen / watch the whole interview, Joāo is deeply knowledgable about the field and we go into a lot of topics, but here are my main takeaways from our chat* Enterprises are adopting agents, starting with internal use-cases* Crews have 4 different kinds of memory, Long Term (across runs), short term (each run), Entity term (company names, entities), pre-existing knowledge (DNA?)* TIL about a "do all links respond with 200" guardrail* Some of the agent tools we mentioned* Stripe Agent API - for agent payments and access to payment data (blog)* Okta Auth for Gen AI - agent authentication and role management (blog)* E2B - code execution platform for agents (e2b.dev)* BrowserBase - programmatic web-browser for your AI agent* Exa - search grounding for agents for real time understanding* Crew has 13 crews that run 24/7 to automate their company* Crews like Onboarding User Enrichment Crew, Meetings Prep, Taking Phone Calls, Generate Use Cases for Leads* GPT-4o mini is the most used model for 2024 for CrewAI with main factors being speed / cost* Speed of AI development makes it hard to standardize and solidify common integrations.* Reasoning models like o1 still haven't seen a lot of success, partly due to speed, partly due to different way of prompting required.This weeks BuzzWe've just opened up pre-registration for our upcoming FREE evaluations course, featuring Paige Bailey from Google and Graham Neubig from All Hands AI (previously Open Devin). We've distilled a lot of what we learned about evaluating LLM applications while building Weave, our LLM Observability and Evaluation tooling, and are excited to share this with you all! Get on the listAlso, 2 workshops (also about Evals) from us are upcoming, one in SF on Jan 11th and one in Seattle on Jan 13th (which I'm going to lead!) so if you're in those cities at those times, would love to see you!And that's it for this week, there wasn't a LOT of news as I said. The interesting thing is, even in the very short week, the news that we did get were all about agents and reasoning, so it looks like 2025 is agents and reasoning, agents and reasoning!See you all next week 🫡TL;DR with links:* Open Source LLMs* HuatuoGPT-o1 - medical LLM designed for medical ...
    Show More Show Less
    1 hr and 31 mins
  • 📆 ThursdAI - Dec 26 - OpenAI o3 & o3 mini, DeepSeek v3 658B beating Claude, Qwen Visual Reasoning, Hume OCTAVE & more AI news
    Dec 27 2024
    Hey everyone, Alex here 👋I was hoping for a quiet holiday week, but whoa, while the last newsletter was only a week ago, what a looong week it has been, just Friday after the last newsletter, it felt like OpenAI has changed the world of AI once again with o3 and left everyone asking "was this AGI?" over the X-mas break (Hope Santa brought you some great gifts!) and then not to be outdone, DeepSeek open sourced basically a Claude 2.5 level behemoth DeepSeek v3 just this morning!Since the breaking news from DeepSeek took us by surprise, the show went a bit longer (3 hours today!) than expected, so as a Bonus, I'm going to release a separate episode with a yearly recap + our predictions from last year and for next year in a few days (soon in your inbox!) TL;DR* Open Source LLMs* CogAgent-9B (Project, Github)* Qwen QvQ 72B - open weights visual reasoning (X, HF, Demo, Project)* GoodFire Ember - MechInterp API - GoldenGate LLama 70B* 🔥 DeepSeek v3 658B MoE - Open Source Claude level model at $6M (X, Paper, HF, Chat)* Big CO LLMs + APIs* 🔥 OpenAI reveals o3 and o3 mini (Blog, X)* X.ai raises ANOTHER 6B dollars - on their way to 200K H200s (X)* This weeks Buzz* Two W&B workshops upcoming in January* SF - January 11* Seattle - January 13 (workshop by yours truly!)* New Evals course with Paige Bailey and Graham Neubig - pre-sign up for free* Vision & Video* Kling 1.6 update (Tweet)* Voice & Audio* Hume OCTAVE - 3B speech-language model (X, Blog)* Tools* OpenRouter added Web Search Grounding to 300+ models (X)Open Source LLMsDeepSeek v3 658B - frontier level open weights model for ~$6M (X, Paper, HF, Chat )This was absolutely the top of the open source / open weights news for the past week, and honestly maybe for the past month. DeepSeek, the previous quant firm from China, has dropped a behemoth model, a 658B parameter MoE (37B active), that you'd need 8xH200 to even run, that beats Llama 405, GPT-4o on most benchmarks and even Claude Sonnet 3.5 on several evals! The vibes seem to be very good with this one, and while it's not all the way beating Claude yet, it's nearly up there already, but the kicker is, they trained it with a very restricted compute, per the paper, with ~2K h800 (which is like H100 but with less bandwidth) for 14.8T tokens. (that's 15x cheaper than LLama 405 for comparison) For evaluations, this model excels on Coding and Math, which is not surprising given how excellent DeepSeek coder has been, but still, very very impressive! On the architecture front, the very interesting thing is, this feels like Mixture of Experts v2, with a LOT of experts (256) and 8+1 active at the same time, multi token prediction, and a lot optimization tricks outlined in the impressive paper (here's a great recap of the technical details)The highlight for me was, that DeepSeek is distilling their recent R1 version into this version, which likely increases the performance of this model on Math and Code in which it absolutely crushes (51.6 on CodeForces and 90.2 on MATH-500) The additional aspect of this is the API costs, and while they are going to raise the prices come February (they literally just swapped v2.5 for v3 in their APIs without telling a soul lol), the price performance for this model is just absurd. Just a massive massive release from the WhaleBros, now I just need a quick 8xH200 to run this and I'm good 😅 Other OpenSource news - Qwen QvQ, CogAgent-9B and GoldenGate LLamaIn other open source news this week, our friends from Qwen have released a very interesting preview, called Qwen QvQ, a visual reasoning model. It uses the same reasoning techniques that we got from them in QwQ 32B, but built with the excellent Qwen VL, to reason about images, and frankly, it's really fun to see it think about an image. You can try it hereand a new update to CogAgent-9B (page), an agent that claims to understand and control your computer, claims to beat Claude 3.5 Sonnet Computer Use with just a 9B model! This is very impressive though I haven't tried it just yet, I'm excited to see those very impressive numbers from open source VLMs driving your computer and doing tasks for you!A super quick word from ... Weights & Biases! We've just opened up pre-registration for our upcoming FREE evaluations course, featuring Paige Bailey from Google and Graham Neubig from All Hands AI. We've distilled a lot of what we learned about evaluating LLM applications while building Weave, our LLM Observability and Evaluation tooling, and are excited to share this with you all! Get on the listAlso, 2 workshops (also about Evals) from us are upcoming, one in SF on Jan 11th and one in Seattle on Jan 13th (which I'm going to lead!) so if you're in those cities at those times, would love to see you!Big Companies - APIs & LLMsOpenAI - introduces o3 and o3-mini - breaking Arc-AGI challenge, GQPA and teasing AGI? On the last day of the 12 days of OpenAI, we've got the evals of their upcoming o3 reasoning model (and o3-mini) and whoah. ...
    Show More Show Less
    1 hr and 36 mins
  • 🎄ThursdAI - Dec19 - o1 vs gemini reasoning, VEO vs SORA, and holiday season full of AI surprises
    Dec 20 2024
    For the full show notes and links visit https://sub.thursdai.news🔗 Subscribe to our show on Spotify: https://thursdai.news/spotify🔗 Apple: https://thursdai.news/appleHo, ho, holy moly, folks! Alex here, coming to you live from a world where AI updates are dropping faster than Santa down a chimney! 🎅 It's been another absolutely BANANAS week in the AI world, and if you thought last week was wild, and we're due for a break, buckle up, because this one's a freakin' rollercoaster! 🎢In this episode of ThursdAI, we dive deep into the recent innovations from OpenAI, including their 1-800 ChatGPT phone service and new advancements in voice mode and API functionalities. We discuss the latest updates on O1 model capabilities, including Reasoning Effort settings, and highlight the introduction of WebRTC support by OpenAI. Additionally, we explore the groundbreaking VEO2 model from Google, the generative physics engine Genesis, and new developments in open source models like Cohere's Command R7b. We also provide practical insights on using tools like Weights & Biases for evaluating AI models and share tips on leveraging GitHub Gigi. Tune in for a comprehensive overview of the latest in AI technology and innovation.00:00 Introduction and OpenAI's 12 Days of Releases00:48 Advanced Voice Mode and Public Reactions01:57 Celebrating Tech Innovations02:24 Exciting New Features in AVMs03:08 TLDR - ThursdAI December 1912:58 Voice and Audio Innovations14:29 AI Art, Diffusion, and 3D16:51 Breaking News: Google Gemini 2.023:10 Meta Apollo 7b Revisited33:44 Google's Sora and Veo234:12 Introduction to Veo2 and Sora34:59 First Impressions of Veo235:49 Comparing Veo2 and Sora37:09 Sora's Unique Features38:03 Google's MVP Approach43:07 OpenAI's Latest Releases44:48 Exploring OpenAI's 1-800 CHAT GPT47:18 OpenAI's Fine-Tuning with DPO48:15 OpenAI's Mini Dev Day Announcements49:08 Evaluating OpenAI's O1 Model54:39 Weights & Biases Evaluation Tool - Weave01:03:52 ArcAGI and O1 Performance01:06:47 Introduction and Technical Issues01:06:51 Efforts on Desktop Apps01:07:16 ChatGPT Desktop App Features01:07:25 Working with Apps and Warp Integration01:08:38 Programming with ChatGPT in IDEs01:08:44 Discussion on Warp and Other Tools01:10:37 GitHub GG Project01:14:47 OpenAI Announcements and WebRTC01:24:45 Modern BERT and Smaller Models01:27:37 Genesis: Generative Physics Engine01:33:12 Closing Remarks and Holiday WishesHere’s a talking podcast host speaking excitedly about his showTL;DR - Show notes and Links* Open Source LLMs* Meta Apollo 7B – LMM w/ SOTA video understanding (Page, HF)* Microsoft Phi-4 – 14B SLM (Blog, Paper)* Cohere Command R 7B – (Blog)* Falcon 3 – series of models (X, HF, web)* IBM updates Granite 3.1 + embedding models (HF, Embedding)* Big CO LLMs + APIs* OpenAI releases new o1 + API access (X)* Microsoft makes CoPilot Free! (X)* Google - Gemini Flash 2 Thinking experimental reasoning model (X, Studio)* This weeks Buzz* W&B weave Playground now has Trials (and o1 compatibility) (try it)* Alex Evaluation of o1 and Gemini Thinking experimental (X, Colab, Dashboard)* Vision & Video* Google releases Veo 2 – SOTA text2video modal - beating SORA by most vibes (X)* HunyuanVideo distilled with FastHunyuan down to 6 steps (HF)* Kling 1.6 (X)* Voice & Audio* OpenAI realtime audio improvements (docs)* 11labs new Flash 2.5 model – 75ms generation (X)* Nexa OmniAudio – 2.6B – multimodal local LLM (Blog)* Moonshine Web – real time speech recognition in the browser (X)* Sony MMAudio - open source video 2 audio model (Blog, Demo)* AI Art & Diffusion & 3D* Genesys – open source generative 3D physics engine (X, Site, Github)* Tools* CerebrasCoder – extremely fast apps creation (Try It)* RepoPrompt to chat with o1 Pro – (download) This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
    Show More Show Less
    1 hr and 36 mins
  • 📆 ThursdAI - Dec 12 - unprecedented AI week - SORA, Gemini 2.0 Flash, Apple Intelligence, LLama 3.3, NeurIPS Drama & more AI news
    Dec 13 2024
    Hey folks, Alex here, writing this from the beautiful Vancouver BC, Canada. I'm here for NeurIPS 2024, the biggest ML conferences of the year, and let me tell you, this was one hell of a week to not be glued to the screen. After last week banger week, with OpenAI kicking off their 12 days of releases, with releasing o1 full and pro mode during ThursdAI, things went parabolic. It seems that all the AI labs decided to just dump EVERYTHING they have before the holidays? 🎅A day after our show, on Friday, Google announced a new Gemini 1206 that became the #1 leading model on LMarena and Meta released LLama 3.3, then on Saturday Xai releases their new image model code named Aurora.On a regular week, the above Fri-Sun news would be enough for a full 2 hour ThursdAI show on it's own, but not this week, this week this was barely a 15 minute segment 😅 because so MUCH happened starting Monday, we were barely able to catch our breath, so lets dive into it! As always, the TL;DR and full show notes at the end 👇 and this newsletter is sponsored by W&B Weave, if you're building with LLMs in production, and want to switch to the new Gemini 2.0 today, how will you know if your app is not going to degrade? Weave is the best way! Give it a try for free.Gemini 2.0 Flash - a new gold standard of fast multimodal LLMsGoogle has absolutely taken the crown away from OpenAI with Gemini 2.0 believe it or not this week with this incredible release. All of us on the show were in agreement that this is a phenomenal release from Google for the 1 year anniversary of Gemini. Gemini 2.0 Flash is beating Pro 002 and Flash 002 on all benchmarks, while being 2x faster than Pro, having 1M context window, and being fully multimodal! Multimodality on input and outputThis model was announced to be fully multimodal on inputs AND outputs, which means in can natively understand text, images, audio, video, documents and output text, text + images and audio (so it can speak!). Some of these capabilities are restricted for beta users for now, but we know they exists. If you remember project Astra, this is what powers that project. In fact, we had Matt Wolfe join the show, and he demoed had early access to Project Astra and demoed it live on the show (see above) which is powered by Gemini 2.0 Flash. The most amazing thing is, this functionality, that was just 8 months ago, presented to us in Google IO, in a premium Booth experience, is now available to all, in Google AI studio, for free! Really, you can try out right now, yourself at https://aistudio.google.com/live but here's a demo of it, helping me proof read this exact paragraph by watching the screen and talking me through it. Performance out of the boxThis model beating Sonnet 3.5 on Swe-bench Verified completely blew away the narrative on my timeline, nobody was ready for that. This is a flash model, that's outperforming o1 on code!?So having a Flash MMIO model with 1M context that is accessible via with real time streaming option available via APIs from the release time is honestly quite amazing to begin with, not to mention that during the preview phase, this is currently free, but if we consider the previous prices of Flash, this model is going to considerably undercut the market on price/performance/speed matrix. You can see why this release is taking the crown this week. 👏 Agentic is coming with Project MarinerAn additional thing that was announced by Google is an Agentic approach of theirs is project Mariner, which is an agent in the form of a Chrome extension completing webtasks, breaking SOTA on the WebVoyager with 83.5% score with a single agent setup. We've seen agents attempts from Adept to Claude Computer User to Runner H, but this breaking SOTA from Google seems very promising. Can't wait to give this a try. OpenAI gives us SORA, Vision and other stuff from the bag of goodiesOk so now let's talk about the second winner of this week, OpenAI amazing stream of innovations, which would have taken the crown, if not for, well... ☝️ SORA is finally here (for those who got in)Open AI has FINALLY released SORA, their long promised text to video and image to video (and video to video) model (nee, world simulator) to general availability, including a new website - sora.com and a completely amazing UI to come with it. SORA can generate images of various quality from 480p up to 1080p and up to 20 seconds long, and they promised that those will be generating fast, as what they released is actually SORA turbo! (apparently SORA 2 is already in the works and will be even more amazing, more on this later) New accounts paused for nowOpenAI seemed to have severely underestimated how many people would like to generate the 50 images per month allowed on the plus account (pro account gets you 10x more for $200 + longer durations whatever that means), and since the time of writing these words on ThursdAI afternoon, I still am not able to create a sora.com account and try out SORA myself (...
    Show More Show Less
    1 hr and 39 mins
  • 📆 ThursdAI - Dec 5 - OpenAI o1 & o1 pro, Tencent HY-Video, FishSpeech 1.5, Google GENIE2, Weave in GA & more AI news
    Dec 6 2024
    Well well well, December is finally here, we're about to close out this year (and have just flew by the second anniversary of chatGPT 🎂) and it seems that all of the AI labs want to give us X-mas presents to play with over the holidays! Look, I keep saying this, but weeks are getting crazier and crazier, this week we got the cheapest and the most expensive AI offerings all at once (the cheapest from Amazon and the most expensive from OpenAI), 2 new open weights models that beat commercial offerings, a diffusion model that predicts the weather and 2 world building models, oh and 2 decentralized fully open sourced LLMs were trained across the world LIVE and finished training. I said... crazy week! And for W&B, this week started with Weave launching finally in GA 🎉, which I personally was looking forward for (read more below)!TL;DR Highlights* OpenAI O1 & Pro Tier: O1 is out of preview, now smarter, faster, multimodal, and integrated into ChatGPT. For heavy usage, ChatGPT Pro ($200/month) offers unlimited calls and O1 Pro Mode for harder reasoning tasks.* Video & Audio Open Source Explosion: Tencent’s HYVideo outperforms Runway and Luma, bringing high-quality video generation to open source. Fishspeech 1.5 challenges top TTS providers, making near-human voice available for free research.* Open Source Decentralization: Nous Research’s DiStRo (15B) and Prime Intellect’s INTELLECT-1 (10B) prove you can train giant LLMs across decentralized nodes globally. Performance is on par with centralized setups.* Google’s Genie 2 & WorldLabs: Generating fully interactive 3D worlds from a single image, pushing boundaries in embodied AI and simulation. Google’s GenCast also sets a new standard in weather prediction, beating supercomputers in accuracy and speed.* Amazon’s Nova FMs: Cheap, scalable LLMs with huge context and global language coverage. Perfect for cost-conscious enterprise tasks, though not top on performance.* 🎉 Weave by W&B: Now in GA, it’s your dashboard and tool suite for building, monitoring, and scaling GenAI apps. Get Started with 1 line of codeOpenAI’s 12 Days of Shipping: O1 & ChatGPT ProThe biggest splash this week came from OpenAI. They’re kicking off “12 days of launches,” and Day 1 brought the long-awaited full version of o1. The main complaint about o1 for many people is how slow it was! Well, now it’s not only smarter but significantly faster (60% faster than preview!), and officially multimodal: it can see images and text together.Better yet, OpenAI introduced a new ChatGPT Pro tier at $200/month. It offers unlimited usage of o1, advanced voice mode, and something called o1 pro mode — where o1 thinks even harder and longer about your hardest math, coding, or science problems. For power users—maybe data scientists, engineers, or hardcore coders—this might be a no-brainer. For others, 200 bucks might be steep, but hey, someone’s gotta pay for those GPUs. Given that OpenAI recently confirmed that there are now 300 Million monthly active users on the platform, and many of my friends already upgraded, this is for sure going to boost the bottom line at OpenAI! Quoting Sam Altman from the stream, “This is for the power users who push the model to its limits every day.” For those who complained o1 took forever just to say “hi,” rejoice: trivial requests will now be answered quickly, while super-hard tasks get that legendary deep reasoning including a new progress bar and a notification when a task is complete. Friend of the pod Ray Fernando gave pro a prompt that took 7 minutes to think through! I've tested the new o1 myself, and while I've gotten dangerously close to my 50 messages per week quota, I've gotten some incredible results already, and very fast as well. This ice-cubes question failed o1-preview and o1-mini and it took both of them significantly longer, and it took just 4 seconds for o1. Open Source LLMs: Decentralization & Transparent ReasoningNous Research DiStRo & DeMo OptimizerWe’ve talked about decentralized training before, but the folks at Nous Research are making it a reality at scale. This week, Nous Research wrapped up the training of a new 15B-parameter LLM—codename “Psyche”—using a fully decentralized approach called “Nous DiStRo.” Picture a massive AI model trained not in a single data center, but across GPU nodes scattered around the globe. According to Alex Volkov (host of ThursdAI), “This is crazy: they’re literally training a 15B param model using GPUs from multiple companies and individuals, and it’s working as well as centralized runs.”The key to this success is “DeMo” (Decoupled Momentum Optimization), a paper co-authored by none other than Diederik Kingma (yes, the Kingma behind Adam optimizer and VAEs). DeMo drastically reduces communication overhead and still maintains stability and speed. The training loss curve they’ve shown looks just as good as a normal centralized run, proving that ...
    Show More Show Less
    1 hr and 32 mins
  • 🦃 ThursdAI - Thanksgiving special 24' - Qwen Open Sources Reasoning, BlueSky hates AI, H controls the web & more AI news
    Nov 28 2024
    Hey ya'll, Happy Thanskgiving to everyone who celebrates and thank you for being a subscriber, I truly appreciate each and every one of you! We had a blast on today's celebratory stream, especially given that today's "main course" was the amazing open sourcing of a reasoning model from Qwen, and we had Junyang Lin with us again to talk about it! First open source reasoning model that you can run on your machine, that beats a 405B model, comes close to o1 on some metrics 🤯 We also chatted about a new hybrid approach from Nvidia called Hymba 1.5B (Paper, HF) that beats Qwen 1.5B with 6-12x less training, and Allen AI releasing Olmo 2, which became the best fully open source LLM 👏 (Blog, HF, Demo), though they didn't release WandB logs this time, they did release data! I encourage you to watch todays show (or listen to the show, I don't judge), there's not going to be a long writeup like I usually do, as I want to go and enjoy the holiday too, but of course, the TL;DR and show notes are right here so you won't miss a beat if you want to use the break to explore and play around with a few things! ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.TL;DR and show notes* Qwen QwQ 32B preview - the first open weights reasoning model (X, Blog, HF, Try it)* Allen AI - Olmo 2 the best fully open language model (Blog, HF, Demo)* NVIDIA Hymba 1.5B - Hybrid smol model beating Qwen, SmolLM w/ 6-12x less training (X, Paper, HF)* Big CO LLMs + APIs* Anthropic MCP - model context protocol (X,Blog, Spec, Explainer)* Cursor, Jetbrains now integrate with ChatGPT MacOS app (X)* Xai is going to be a Gaming company?! (X)* H company shows Runner H - WebVoyager Agent (X, Waitlist) * This weeks Buzz* Interview w/ Thomas Cepelle about Weave scorers and guardrails (Guide)* Vision & Video* OpenAI SORA API was "leaked" on HuggingFace (here)* Runway launches video Expand feature (X)* Rhymes Allegro-TI2V - updated image to video model (HF)* Voice & Audio* OuteTTS v0.2 - 500M smol TTS with voice cloning (Blog, HF)* AI Art & Diffusion & 3D* Runway launches an image model called Frames (X, Blog)* ComfyUI Desktop app was released 🎉* Chat* 24 hours of AI hate on 🦋 (thread)* Tools* Cursor agent (X thread)* Google Generative Chess toy (Link)See you next week and happy Thanks Giving 🦃Thanks for reading ThursdAI - Recaps of the most high signal AI weekly spaces! This post is public so feel free to share it.Full Subtitles for convenience[00:00:00] Alex Volkov: let's get it going.[00:00:10] Alex Volkov: Welcome, welcome everyone to ThursdAI November 28th Thanksgiving special. My name is Alex Volkov. I'm an AI evangelist with Weights Biases. You're on ThursdAI. We are live [00:00:30] on ThursdAI. Everywhere pretty much.[00:00:32] Alex Volkov:[00:00:32] Hosts and Guests Introduction[00:00:32] Alex Volkov: I'm joined here with two of my co hosts.[00:00:35] Alex Volkov: Wolfram, welcome.[00:00:36] Wolfram Ravenwolf: Hello everyone! Happy Thanksgiving![00:00:38] Alex Volkov: Happy Thanksgiving, man.[00:00:39] Alex Volkov: And we have Junyang here. Junyang, welcome, man.[00:00:42] Junyang Lin: Yeah, hi everyone. Happy Thanksgiving. Great to be here.[00:00:46] Alex Volkov: You had a busy week. We're going to chat about what you had. I see Nisten joining us as well at some point.[00:00:51] Alex Volkov: Yam pe joining us as well. Hey, how, Hey Yam. Welcome. Welcome, as well. Happy Thanksgiving. It looks like we're assembled folks. We're across streams, across [00:01:00] countries, but we are.[00:01:01] Overview of Topics for the Episode[00:01:01] Alex Volkov: For November 28th, we have a bunch of stuff to talk about. Like really a big list of stuff to talk about. So why don't we just we'll just dive in. We'll just dive in. So obviously I think the best and the most important.[00:01:13] DeepSeek and Qwen Open Source AI News[00:01:13] Alex Volkov: Open source kind of AI news to talk about this week is going to be, and I think I remember last week, Junyang, I asked you about this and you were like, you couldn't say anything, but I asked because last week, folks, if you remember, we talked about R1 from DeepSeek, a reasoning model from [00:01:30] DeepSeek, which really said, Oh, maybe it comes as a, as open source and maybe it doesn't.[00:01:33] Alex Volkov: And I hinted about, and I asked, Junyang, what about some reasoning from you guys? And you couldn't say anything. so this week. I'm going to do a TLDR. So we're going to actually talk about the stuff that, you know, in depth a little bit later, but this week, obviously one of the biggest kind of open source or sorry, open weights, and news is coming from our friends at Qwen as well, as we always celebrate.[00:01:56] Alex Volkov: So one of the biggest things that we get as. [00:02:00] is, Qwen releases, I will actually have you tell me what's the pronunciation ...
    Show More Show Less
    1 hr and 46 mins
  • 📆 ThursdAI - Nov 21 - The fight for the LLM throne, OSS SOTA from AllenAI, Flux new tools, Deepseek R1 reasoning & more AI news
    Nov 22 2024
    Hey folks, Alex here, and oof what a 🔥🔥🔥 show we had today! I got to use my new breaking news button 3 times this show! And not only that, some of you may know that one of the absolutely biggest pleasures as a host, is to feature the folks who actually make the news on the show!And now that we're in video format, you actually get to see who they are! So this week I was honored to welcome back our friend and co-host Junyang Lin, a Dev Lead from the Alibaba Qwen team, who came back after launching the incredible Qwen Coder 2.5, and Qwen 2.5 Turbo with 1M context.We also had breaking news on the show that AI2 (Allen Institute for AI) has fully released SOTA LLama post-trained models, and I was very lucky to get the core contributor on the paper, Nathan Lambert to join us live and tell us all about this amazing open source effort! You don't want to miss this conversation!Lastly, we chatted with the CEO of StackBlitz, Eric Simons, about the absolutely incredible lightning in the bottle success of their latest bolt.new product, how it opens a new category of code generator related tools.00:00 Introduction and Welcome00:58 Meet the Hosts and Guests02:28 TLDR Overview03:21 Tl;DR04:10 Big Companies and APIs07:47 Agent News and Announcements08:05 Voice and Audio Updates08:48 AR, Art, and Diffusion11:02 Deep Dive into Mistral and Pixtral29:28 Interview with Nathan Lambert from AI230:23 Live Reaction to Tulu 3 Release30:50 Deep Dive into Tulu 3 Features32:45 Open Source Commitment and Community Impact33:13 Exploring the Released Artifacts33:55 Detailed Breakdown of Datasets and Models37:03 Motivation Behind Open Source38:02 Q&A Session with the Community38:52 Summarizing Key Insights and Future Directions40:15 Discussion on Long Context Understanding41:52 Closing Remarks and Acknowledgements44:38 Transition to Big Companies and APIs45:03 Weights & Biases: This Week's Buzz01:02:50 Mistral's New Features and Upgrades01:07:00 Introduction to DeepSeek and the Whale Giant01:07:44 DeepSeek's Technological Achievements01:08:02 Open Source Models and API Announcement01:09:32 DeepSeek's Reasoning Capabilities01:12:07 Scaling Laws and Future Predictions01:14:13 Interview with Eric from Bolt01:14:41 Breaking News: Gemini Experimental01:17:26 Interview with Eric Simons - CEO @ Stackblitz01:19:39 Live Demo of Bolt's Capabilities01:36:17 Black Forest Labs AI Art Tools01:40:45 Conclusion and Final ThoughtsAs always, the show notes and TL;DR with all the links I mentioned on the show and the full news roundup below the main new recap 👇Google & OpenAI fighting for the LMArena crown 👑I wanted to open with this, as last week I reported that Gemini Exp 1114 has taken over #1 in the LMArena, in less than a week, we saw a new ChatGPT release, called GPT-4o-2024-11-20 reclaim the arena #1 spot!Focusing specifically on creating writing, this new model, that's now deployed on chat.com and in the API, is definitely more creative according to many folks who've tried it, with OpenAI employees saying "expect qualitative improvements with more natural and engaging writing, thoroughness and readability" and indeed that's what my feed was reporting as well.I also wanted to mention here, that we've seen this happen once before, last time Gemini peaked at the LMArena, it took less than a week for OpenAI to release and test a model that beat it.But not this time, this time Google came prepared with an answer!Just as we were wrapping up the show (again, Logan apparently loves dropping things at the end of ThursdAI), we got breaking news that there is YET another experimental model from Google, called Gemini Exp 1121, and apparently, it reclaims the stolen #1 position, that chatGPT reclaimed from Gemini... yesterday! Or at least joins it at #1LMArena Fatigue?Many folks in my DMs are getting a bit frustrated with these marketing tactics, not only the fact that we're getting experimental models faster than we can test them, but also with the fact that if you think about it, this was probably a calculated move by Google. Release a very powerful checkpoint, knowing that this will trigger a response from OpenAI, but don't release your most powerful one. OpenAI predictably releases their own "ready to go" checkpoint to show they are ahead, then folks at Google wait and release what they wanted to release in the first place.The other frustration point is, the over-indexing of the major labs on the LMArena human metrics, as the closest approximation for "best". For example, here's some analysis from Artificial Analysis showing that the while the latest ChatGPT is indeed better at creative writing (and #1 in the Arena, where humans vote answers against each other), it's gotten actively worse at MATH and coding from the August version (which could be a result of being a distilled much smaller version) .In summary, maybe the LMArena is no longer 1 arena is all you need, but the competition at the TOP scores of the Arena has never been ...
    Show More Show Less
    1 hr and 45 mins