AI Papers Podcast

Episodes

Improving Agent Design, JPEG-LM's Visual Breakthrough, TurboEdit's Real-Time Image Edits, Video Segmentation Advances, LLMs Learning Like Humans, RL Benchmarks

Aug 21 2024

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Automated Design of Agentic Systems TurboEdit: Instant text-based image editing Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
Show More Show Less

16 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
Science & Clinical LLMs Leaps, Enhancing Small Model Reasoning, New Frontiers in Controlled Media Generation

Aug 16 2024

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Med42-v2: A Suite of Clinical LLMs Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers ControlNeXt: Powerful and Efficient Control for Image and Video Generation CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Show More Show Less

14 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
Multimodal Benchmarks, Visual Task Transfer, and 3D Object Generation

Aug 8 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models LLaVA-OneVision: Easy Visual Task Transfer An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Diffusion Models as Data Mining Tools
Show More Show Less

14 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
Image and Video Segmentation with SAM 2, Gemma 2 for Efficient Language Models, Boosting Small Models with Contrastive Fine-Tuning, and MM-Vet v2 Challenges Large Multimodal Models

Aug 5 2024

SAM 2: Segment Anything in Images and Videos Gemma 2: Improving Open Language Models at a Practical Size Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning OmniParser for Pure Vision Based GUI Agent SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Show More Show Less

14 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
Text-Guided Image Inpainting, AMEX for Mobile GUI Agents, AgentScope's Multi-Agent Simulation

Jul 30 2024

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model LAMBDA: A Large Model Based Data Agent AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation Very Large-Scale Multi-Agent Simulation in AgentScope Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? Course-Correction: Safety Alignment Using Synthetic Preferences
Show More Show Less

14 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
OpenDevin & AI Software Development, Enhancing Visual Language Models, , DDK: Refining Large Language Model Efficiency through Domain Knowledge

Jul 25 2024

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents VILA^2: VILA Augmented VILA HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation PERSONA: A Reproducible Testbed for Pluralistic Alignment SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Scalify: scale propagation for efficient low-precision LLM training DDK: Distilling Domain Knowledge for Efficient Large Language Models
Show More Show Less

14 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
Vocabulary Expansion for Large Models, Big Data Enhancing LMs, 4D Reconstruction Progress, AI Cityscape Generation, DPO Policy Analysis, Expanding Code Models, Multimodal LM Trust Evaluation

Jul 22 2024

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies Scaling Retrieval-Based Language Models with a Trillion-Token Datastore Shape of Motion: 4D Reconstruction from a Single Video Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion Understanding Reference Policies in Direct Preference Optimization Scaling Granite Code Models to 128K Context Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study
Show More Show Less

15 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free
Qwen2 Language Model, Mitigating Privacy Risks in LLMs, Exploring Non-Determinism, Increased Efficiency with Q-Sparse, GRUtopia for Embodied AI

Jul 17 2024

Qwen2 Technical Report Learning to Refuse: Towards Mitigating Privacy Risks in LLMs The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Q-Sparse: All Large Language Models can be Fully Sparsely-Activated GRUtopia: Dream General Robots in a City at Scale
Show More Show Less

11 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Listen for free

Audiobook Categories

Popular Lists

Explore Audible

Episodes

Improving Agent Design, JPEG-LM's Visual Breakthrough, TurboEdit's Real-Time Image Edits, Video Segmentation Advances, LLMs Learning Like Humans, RL Benchmarks

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Science & Clinical LLMs Leaps, Enhancing Small Model Reasoning, New Frontiers in Controlled Media Generation

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Multimodal Benchmarks, Visual Task Transfer, and 3D Object Generation

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Image and Video Segmentation with SAM 2, Gemma 2 for Efficient Language Models, Boosting Small Models with Contrastive Fine-Tuning, and MM-Vet v2 Challenges Large Multimodal Models

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Text-Guided Image Inpainting, AMEX for Mobile GUI Agents, AgentScope's Multi-Agent Simulation

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

OpenDevin & AI Software Development, Enhancing Visual Language Models, , DDK: Refining Large Language Model Efficiency through Domain Knowledge

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Vocabulary Expansion for Large Models, Big Data Enhancing LMs, 4D Reconstruction Progress, AI Cityscape Generation, DPO Policy Analysis, Expanding Code Models, Multimodal LM Trust Evaluation

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed

Qwen2 Language Model, Mitigating Privacy Risks in LLMs, Exploring Non-Determinism, Increased Efficiency with Q-Sparse, GRUtopia for Embodied AI

Failed to add items

Add to basket failed.

Add to wishlist failed.

Remove from wishlist failed.

Adding to library failed

Follow podcast failed

Unfollow podcast failed