Voice interfaces introduced by smart speakers present new opportunities and challenges for podcast content recommendations. Understanding how users interact with voice-based recommendations has the potential to inform better design of vocal recommenders. However, existing knowledge about user behavior is mostly for visual interfaces, such as the web, and is not directly transferable to voice interfaces, which rely on user listening and do not support skimming and browsing. To fill in the gap, we conducted a controlled study to compare user interactions with recommendations delivered visually to those with recommendations delivered vocally. Through an online A/B testing with 100 participants, we found that when recommendations are vocally conveyed, users consume more slowly, explore less, and choose fewer long-tail items. The study also reveals the correlation between user choices and exploration via voice interfaces. Our findings pose challenges to the design of voice interfaces, such as adaptively recommending diverse content and designing better navigation mechanisms.