• How One Engineer Cut Logging Costs 90 Percent Without Losing Observability
    Jun 29 2026
    Episode 80 of The Software Engineering Podcast dives into a specific cost-optimization story: how a senior engineer at a mid-size fintech company reduced their cloud logging bill by 90 percent — from $80,000 per month to under $8,000 — without sacrificing the signal their on-call team relied on. Lucas and Luna walk through the technical decisions: switching from structured JSON logging to a custom binary format with protobuf, implementing a two-tier retention policy that kept high-cardinality metrics hot for only 24 hours, and writing a smart sampling layer that preserved 100 percent of error traces while dropping 95 percent of repetitive success logs. They discuss the trade-offs — longer query times on cold storage, the learning curve for the team, and the initial pushback from developers used to grep-friendly logs. The episode ends with a practical framework any team can adapt: measure your log volume per service, identify the noisiest sources, and ask whether every field in every log line earns its storage cost. #SoftwareEngineering #CloudCosts #Observability #Logging #Fintech #Protobuf #BinaryFormat #Sampling #RetentionPolicy #CostOptimization #EngineeringPodcast #FexingoBusiness #BusinessPodcast #Technology #LogManagement #Scalability #DevOps #SiteReliability Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    9 mins
  • How One Engineer Repaired a Corrupt Git Repository Without Losing History
    Jun 28 2026
    When a junior developer force-pushed over a shared branch and corrupted the entire git history, most teams would panic. In this episode, Lucas and Luna break down how one engineer rescued a 14-month-old codebase using git reflog, filter-repo, and careful cherry-picking. They walk through the specific commands, the decision tree for when to rewrite history versus when to accept it, and the single backup practice that saved the team from losing 900 commits. If you've ever wondered what to do when git itself seems broken, this is the episode for you. #Git #VersionControl #GitReflog #GitFilterRepo #CodeRecovery #EngineeringBestPractices #DevOps #SourceControl #SoftwareEngineering #Technology #FexingoBusiness #BusinessPodcast #CodeRescue #HistoryRewrite #CherryPick #ForcePush #Debugging #DeveloperTools Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    8 mins
  • How One Engineer Debugged a Kubernetes Pod Eviction That Wiped 5000 Jobs
    Jun 28 2026
    In this episode of The Software Engineering Podcast with Fexingo, Lucas and Luna dive into a production nightmare: a Kubernetes cluster that silently evicted over 5000 batch jobs over three weekends. They walk through how one engineer at a data processing startup traced the root cause to a subtle interaction between kubelet resource reservation defaults and a misconfigured eviction threshold. Learn how she used Prometheus metrics, a custom admission webhook, and a prioritization framework to prevent it from happening again. A masterclass in debugging distributed systems under pressure. #Kubernetes #PodEviction #DevOps #SiteReliabilityEngineering #DistributedSystems #BatchProcessing #Prometheus #AdmissionWebhook #DataProcessing #ProductionDebugging #CloudNative #SRE #EngineeringResilience #IncidentResponse #FexingoBusiness #BusinessPodcast #Technology #SoftwareEngineering Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    8 mins
  • How One Engineer Fixed a Sidekiq Memory Bloat That Crunched Servers Every 72 Hours
    Jun 27 2026
    Episode 77 of The Software Engineering Podcast digs into a deceptively simple bug: a Sidekiq worker that ballooned in memory every 72 hours, forcing the ops team to restart it manually. Lucas and Luna walk through how one engineer discovered the culprit—a cached ActiveRecord relation that never cleared—and how a single call to `.reload` cut memory usage by 80 percent. They discuss lazy evaluation pitfalls in Ruby, the importance of profiling in production, and why a ten-line fix can save a team six figures in infrastructure costs. If you've ever fought a memory leak that only shows up after days of uptime, this episode is for you. #Sidekiq #RubyOnRails #MemoryLeak #BackgroundJobs #ActiveRecord #LazyEvaluation #RubyMemoryProfiling #DerailedBenchmarks #MemoryBloat #72HourBug #ProductionDebugging #EngineeringStory #SoftwareEngineering #Technology #FexingoBusiness #BusinessPodcast #CodeQuality #PerformanceOptimization Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    8 mins
  • How One Engineer Handled a Database Migration With Zero Downtime Using Flyway
    Jun 27 2026
    In episode 76 of The Software Engineering Podcast, Lucas and Luna dive into the story of a senior engineer at a mid-sized e-commerce company who migrated a critical PostgreSQL database from a single instance to a replicated cluster without any downtime. The migration involved 200 GB of data, 50 tables, and a tight deadline. The engineer used Flyway for schema versioning, pglogical for replication, and a careful cutover strategy that included read-only mode, dual writes, and a final switch. They walk through the step-by-step approach, the pitfalls that were avoided (like schema drift and replication lag), and the key lesson: safe migrations are about orchestration, not just tools. If you've ever dreaded a database migration, this episode is for you. #DatabaseMigration #Flyway #PostgreSQL #ZeroDowntime #Engineering #Technology #SoftwareEngineering #pglogical #SchemaVersioning #Ecommerce #DevOps #DataEngineering #Production #MigrationStrategy #Database #FexingoBusiness #TechPodcast #Code Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    5 mins
  • How One Engineer Rewrote a Legacy Database Without Downtime
    Jun 26 2026
    Episode 75 of The Software Engineering Podcast tells the true story of a senior engineer at a mid-sized logistics firm who migrated a 15-year-old PostgreSQL monolith to a sharded, horizontally-scalable architecture without a single minute of planned downtime. The project touched 4.3 million lines of trigger code and 2,800 stored procedures. By combining logical replication, a write-ahead log change data capture pipeline, and a phased cutover with canary reads, the team moved 12 terabytes of data incrementally over six weeks. The episode breaks down the exact strategy: how they avoided dual-write complexity, handled schema drift, and rolled back within 90 seconds when a hot-spot partition caused latency spikes. Lucas and Luna discuss the tradeoffs between trigger-based replication versus streaming replication, why they chose NOT to use an ORM abstraction layer, and what happened when a foreign key constraint broke the CDC pipeline at 2 AM. This is a deep, practical look at legacy database modernisation for engineers facing similar migrations. #LegacyDatabaseMigration #PostgreSQL #DatabaseSharding #ChangeDataCapture #ZeroDowntimeMigration #LogicalReplication #SoftwareEngineering #TechPodcast #DatabaseArchitecture #ProductionEngineering #LucasAndLuna #FexingoBusiness #BusinessPodcast #EngineeringBestPractices #DataMigration #Postgres #WriteAheadLog #CanaryDeployments Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    8 mins
  • How One Engineer Debugged a Floating Point Bug That Cost a Drone Company $2 Million
    Jun 26 2026
    In this episode of The Software Engineering Podcast with Fexingo, Lucas and Luna dive into a real-world debugging nightmare: a floating-point precision bug in a drone navigation system that caused erratic flight patterns and led to $2 million in lost contracts. They break down the core issue — why 0.1 + 0.2 does not equal 0.3 in IEEE 754 binary floating-point representation — and walk through the specific scenario where accumulated rounding errors in a Kalman filter threw off position estimates by over a meter. The conversation covers how the engineer traced the bug using delta debugging, why a switch to fixed-point arithmetic fixed it, and what lessons every developer should take away about numerical computing in safety-critical systems. If you've ever wondered why your calculations seem off or why NASA lost a Mars orbiter, this episode will give you a concrete, actionable understanding of floating-point gotchas. No abstract theory — just the actual numbers and code-level decisions that made the difference between a working drone and a $2 million mistake. #FloatingPoint #DroneEngineering #Debugging #NumericalPrecision #IEEE754 #KalmanFilter #FixedPointArithmetic #SafetyCriticalSystems #DeltaDebugging #SoftwareEngineering #Technology #Programming #EmbeddedSystems #RealWorldBug #FexingoBusiness #BusinessPodcast #CodeQuality #TechPodcast Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    9 mins
  • How a Single Index Cut Query Time from Minutes to Milliseconds
    Jun 25 2026
    In this episode, Lucas and Luna dive into the story of how one database index — just one — turned a query that took over two minutes into a response that came back in under twenty milliseconds. They walk through the specific scenario: a PostgreSQL database on AWS RDS, a SaaS startup seeing page load times spike to critical levels, and the engineer who found the problem by examining the query plan. Along the way, they explain what a B-tree index actually does under the hood, why sequential scans kill performance at scale, how to read an EXPLAIN ANALYZE output like a pro, and the counterintuitive truth that adding an index can sometimes slow things down. This is a concrete, actionable episode for anyone who works with databases — no theory without practice. #DatabasePerformance #PostgreSQL #QueryOptimization #Indexing #BTreeIndex #ExplainAnalyze #SequentialScan #RDS #AWS #BackendEngineering #SoftwareEngineering #TechPodcast #DatabaseTuning #Latency #ProductionDebugging #FexingoBusiness #BusinessPodcast #Technology Keep every episode free: buymeacoffee.com/fexingo
    Show More Show Less
    11 mins