Vitaliy Danylov: "Voice is the oldest interface we have."

In 2025, 78% of companies worldwide use AI in at least one business function—up 55% from the year before, according to Hostinger. Artificial intelligence is entering all aspects of business, from marketing and sales to support and reporting.

Vitaliy Danylov—a software engineer, co-founder of a U.S.-based startup in the field of voice AI, and author of four scientific papers on multilingual speech synthesis and deepfake detection—is at the heart of these changes. With experience in business systems implementation at global manufacturing companies, in building cloud tools for Take-Two Interactive Software, and academic training from NYU and Boston University, he combines scientific depth with practical knowledge of how AI is reshaping the modern workplace.

"Companies were much more comfortable automating tasks that were already outsourced."

– Vitaliy, in your corporate career, you automated processes in large international firms—from financial reporting at Shiloh Industries to data and metadata workflows at Take-Two Interactive Software, a company that develops, publishes, and distributes video games. What patterns did you observe? Which office functions were most ready for replacement by automated solutions?

– Most automation happens when three factors align: the task is technically doable, significantly cheaper than using humans, and not regulated. When I looked at the processes across industries like manufacturing and entertainment tech, the clearest early candidates were workflows that were well-documented, repeatable on a daily or weekly basis, and not highly visible to top management. These tended to be back-office roles or internal support functions that quietly kept the corporate machine running.

Another pattern I observed was psychological: companies were much more comfortable automating tasks that were already outsourced to consultants. Automating overseas work, for example, was framed as "insourcing the knowledge" rather than cutting internal staff. Automation was often a way to reduce dependence on external vendors, not just a tool to reduce headcount expense.

– You've worked with enterprise systems and modern AI tools and have seen the evolution of process automation from within large international companies. How has your view evolved on which human skills are truly irreplaceable?

– Honestly, I no longer believe any skill is truly irreplaceable—at least not in the way we used to think. It's more accurate to say that certain skills are temporarily uneconomical to automate. But that's a moving target. What used to require judgment or intuition is increasingly being absorbed by systems trained on vast datasets.

One critical area where humans still dominate is in dealing with incomplete or ambiguous information. Large language models, LLMs, tend to hallucinate when faced with vague input; rather than asking for clarification, they often extrapolate with confidence and generate factually wrong answers. Most humans, by contrast, will pause and say, "I don't have enough information." That kind of restraint—and the ability to ask follow-up questions—is still hard to replicate.

"We're going back to voice as the most efficient way to interact with machines."

– In 2024–2025, you published the book "Voice Without Borders: AI's Path to Universal Multilingual Communication" and three peer-reviewed articles on voice AI technologies. What key insights from your research led you to believe that voice will fundamentally change how we interact with technology?

– Voice is the oldest interface we have—long before written language, humans and even animals relied on sound to communicate. That longevity suggests voice isn't going anywhere. Nassim Taleb's 'Lindy Effect' indicates that the longer a technology has been around, the longer it's likely to persist. By that logic, spoken language—used for tens of thousands of years—may outlast text-based systems by centuries. What has changed recently is the technology: ASR systems, such as Whisper, have finally become accurate and fast enough to be of practical use. Before that, speech recognition was too clunky to replace typing. According to a Stanford study, the average human speech rate is 150–160 words per minute (wpm), while typing averages just 40–50 wpm on desktops and under 30 wpm on mobile devices. Now that voice is 3–5x faster than text, and systems can handle noisy environments and accents reliably, there's no reason for typing to remain the default. My research confirmed what intuition already hinted: we're going back to voice, not as a novelty, but as the most efficient way to interact with machines.

– Your AI voice research includes authored papers on real-time voice synthesis, deepfake detection, and multilingual voice AI, all cited in academic and industry work. Given this, which breakthroughs in these areas do you see as essential for voice to truly replace text as the main communication channel within the next 3 to 5 years?

– The real obstacle isn't how natural the voice sounds—it's latency. Studies suggest that human-to-human response latency rarely exceeds 250–300 milliseconds. For AI to feel natural, we need sub-500ms end-to-end pipelines—including ASR, LLM inference, and TTS. Therefore, people will tolerate a robotic tone far better than a 5-second delay. For the voice to feel natural, the response needs to be near-instant. That means faster ASR, quicker LLM reasoning, and low-lag TTS. Deepfake protection matters, too, but the simplest solution is not cloning everyone's voice. Generic, high-clarity voices are often preferred, especially in business. They reduce misunderstandings and lower biometric risk. So the big leap won't come from sounding more human—it'll come from responding like one, in real time.

"Start thinking in terms of task structure, not job titles."

– Ten years ago, it seemed logical that robots would first replace factory workers on assembly lines, and only later, office employees. Based on your educational background in software development at Boston University and multiple years of experience across different industries, why did the reality turn out to be the opposite?

– It did seem that way, indeed. For years, we assumed physical labor was easier to automate than intellectual work because it looked simpler and more repeatable. But in practice, it turned out that automating human cognition—at least the parts involved in routine white-collar work—was easier than replacing human spatial awareness. That was the unexpected twist.

Digital employees can live entirely within cloud infrastructure, where everything is already structured, logged, and observable. You can see what they're doing, monitor inputs and outputs, and audit their actions in detail. Office work happens on screens, using files, emails, and structured systems. It's a controlled environment, and that's perfect for automation.

By contrast, physical work involves the real world. And the real world is messy. It's loud. Irregular and poorly mapped. Robotic systems still struggle with real-time physical interaction: how to grasp objects with variable textures, how to navigate cluttered spaces, how to detect micro-errors in muscle-like movements. These are hard problems. Just look how most robots perform at tasks that seem natural to humans, such as folding clothing or ironing a shirt. They require a level of sensory coordination and adaptability that's far beyond where robotics is today, especially at a price point that makes economic sense.

So yes, the logic flipped. It turned out that intellectual labor is easier to model, monitor, and replace than physical work, at least for now.

– Your experience with Oracle Cloud and building predictive modeling tools based on datasets of automotive giants like Ford, GM, and Chrysler gives you insights into how complex business processes work. Which office roles will be replaced by digital employees first, and why?

– I think the key to understanding which office roles will be automated first is to stop thinking in terms of job titles and start thinking in terms of task structure. Most businesses don't care about job labels. They care about whether a task to be automated is predictable, low-risk, and economically justifiable.

From my experience, digital employees will enter first where the workflows lack external regulation and are high-volume, well-defined, and repetitive—things like internal support tickets, batch data processing, or answering simple customer service queries. These are the kinds of roles where a human doesn't need to think differently every time. If the task remains unchanged from yesterday and is likely to remain the same next week, a digital employee can probably handle it.

But there's more to it than just feasibility. It has to be worth it financially. Replacing a human with a digital employee only makes sense if it's significantly cheaper or better. As Peter Thiel famously argued in 'Zero to One,' successful new technologies need to be at least 10x better than what they replace. Anything less, and there's too much corporate inertia to overcome.

Also, the cost of a mistake matters. If an error could cost you a $10 million client, you're not going to let a model handle that interaction. But if the stakes are low—like helping someone reset a password or answering a basic onboarding HR question—the threshold for automation is much lower.

"Build your career around that delta."

– Based on your experience working at public companies with high market capitalization like Take-Two Interactive Software, how do investors evaluate the potential for replacing different types of workers? Where do they see the highest ROI?

– From what I've seen, investors think in terms of spreadsheets and probabilities. They would be asking one basic question: What's the net present value of automating this role over the next five years? That means calculating the savings from reduced labor costs, estimating the revenue gains from faster execution, and then discounting those future gains against the implementation cost and risk. It's not emotional. It's DCF math.

What investors are also cautious about is legal exposure, compliance issues, and brand damage. If automating a role introduces regulatory risk or if the digital employee could embarrass the company with a bad decision, that becomes a red flag. They also worry about vendor lock-in—being too reliant on a single AI provider like OpenAI or Google would introduce a form of platform risk.

So ultimately, investors look at automation the same way they look at any other investment: High ROI and low risk wins. And if those conditions aren't met, they'll keep humans in the loop, at least for now.

– As an expert who has worked in financial services, energy companies, automotive manufacturing, and the video game development industry, in which areas will human–digital employee coexistence be most harmonious?

– One of the best ways to think about this is to borrow a framework from robotics: the 5Ds—Dirty, Difficult, Dangerous, Dull, and Dear. Those are the types of tasks where humans have historically been happy to let machines step in. And they apply just as well to digital employees in office settings as they do to physical robots on factory floors.

If the work is repetitive, mentally exhausting, offers no upward mobility, and is difficult to staff, people will be glad to hand it over. Think about the jobs nobody dreams of, and then you can be sure that digital employees can hold those jobs with zero resentment.

But when the work is visible, strategic, or tied to career advancement, people are much less willing to share the space. If a digital assistant gets assigned to a high-profile project that someone was hoping to use to get promoted, you'll see real friction. Harmony comes when digital employees handle the work nobody wants, not the work everyone is competing for.

– As a mentor to students and startup founders, even having received a commendation letter from NYU Alumni in Tech Club this year for mentoring, what skills do you recommend students develop to work effectively alongside digital employees?

– The first skill I recommend is surprisingly simple: learn where large language models fail. Understand when they hallucinate and why they make things up. That alone would put you ahead of most people in the market. The point here is not to compete with digital employees where they're strong, because you'll lose. Instead, find the gaps. Digital employees are great at doing the same thing over and over with perfect memory and no sleep. Humans are great at dealing with edge cases, interpreting vague instructions, and understanding social context. So, build your career around that delta.

I also tell students to think in terms of ownership. Don't just "do tasks"—own outcomes. Be the person responsible for the result, not someone who can respond like any LLM would do with a short note, "I apologize, but something went wrong." If something goes wrong, be the one who owns and fixes it. That mindset—accountability, not execution—is what separates humans from bots in the long term.

Vitaliy Danylov: "Voice is the oldest interface we have."

Most Popular

Tesla Cybertruck Crashes Anti-ICE Protests in LA, Becomes Unlikely Symbol of Trump Controversy

Elon Musk Claims Tesla Robotaxi Will Hit Streets This Month: 'Most Important Product' Yet

Texas Official Shot Down Siren Flood Alert, Complaining That It Might Go Off 'In the Middle of the Night': Report

Google Earthquake Detection Comes to Wear OS Watches; Life-Saving Alerts Now on Your Wrist

How Much Water and Energy Does ChatGPT Use? Sam Altman Breaks Down the Numbers

Latest Stories

New Study Links COVID-19 to Accelerated Blood Vessel Aging, Particularly in Women

Woman Hospitalised After Popping a Pimple Most People Wouldn't Think Twice About — What Is the 'Triangle of Death'?

Trump Administration Declares COVID-19 Likely Originated from Wuhan Lab Leak, Citing Scientific Evidence

Say Goodbye to Dark Spots: The Science Behind Dark Spot Remover Creams

Recommended Stories

Voyager 2’s Historic Uranus Flyby May Have Captured Rare Event, Changing Scientists’ View of the Planet

Is the Ozone Layer Repairing Itself? Scientists Think So

SpaceX Dragon Successfully Docks With ISS, Delivering 6,000 Pounds of Supplies

Colorectal Cancer Deaths Increasing Among Millennials and Gen X: Learn the Warning Signs

Vitaliy Danylov: "Voice is the oldest interface we have."

Most Popular

Latest Stories

Subscribe to The Science Times!

Recommended Stories