Why AI Inference Startups Are Undercutting the Cloud Giants

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

Why AI Inference Startups Are Undercutting the Cloud Giants

Listen for free

View show details

Lucas and Luna explore how a new wave of AI inference startups—like Groq, Fireworks AI, and Together AI—are offering faster, cheaper model deployment than AWS, Azure, or Google Cloud. With NVIDIA's stock at $210 and AMD climbing 5% in a week, the hardware landscape is shifting, but the real battle is in software: specialized inference engines that cut latency by 80% and cost by half. Lucas walks through the economics: a typical production AI workload running on AWS Inferentia costs roughly $0.30 per hour, while a startup's custom stack can do the same task for under $0.10. Luna questions whether the startups can sustain margins when the big cloud providers inevitably drop prices. The conversation also touches on ARM's surprising 15% weekly gain—partly fueled by inference-optimized chip designs. A concrete look at the infrastructure layer that will determine whether generative AI becomes a utility or a luxury. #AIInference #CloudComputing #NVIDIA #AMD #ARM #Groq #FireworksAI #TogetherAI #AWS #Azure #GoogleCloud #InferenceStartups #TechInfrastructure #GenerativeAI #LLM #CloudPricing #FexingoBusiness #Technology Keep every episode free: buymeacoffee.com/fexingo

No reviews yet