AI Text-to-Speech App
@Radiobooks

Introduction

Radiobooks is a start-up that leverages AI to convert books into audiobooks. Our software features an intuitive editing studio that offers users extensive control over the audio output, allowing them to customize it to fit their requirements. We prioritized delivering a well-designed, user-friendly experience.

How it Works

To convert a text document to audio using our software, you first need to create a new project by uploading an EPUB or PDF file, uploading an optional cover, and choosing a narration voice from a large selection of languages. The document will take a few seconds to be processed. During this step, we run algorithms that analyze the submitted document—detecting images, tables, chapters, page numbers, etc.—and automatically identify which content should be read or not (for example, the details of the publisher are automatically left out). Once this is finished, you can click on your new project to enter the editing studio, where you'll have access to the document's text together with the per-paragraph audios that are generated on the fly. At this point, you can listen to the audios and make appropriate changes: from tweaking specific text passages to adjusting pauses, altering narration voices, and fine-tuning audio speed, pitch, and volume. You can listen to the updated audio after each change. When you're satisfied with the end result, you can download the full generated audio or just a few chapters for listening on the go.

The motivation

The classical way of converting a book into an audiobook involves renting a studio, hiring a voice actor, hiring a sound engineer, and going through a recording-and-correction back-and-forth that usually lasts several months. Because this process is resource-intensive, only famous authors stand a chance of ever having their books converted into audio: small and independent authors, and other content creators, can only dream about it...

Because human narration is the gold standard, audiobooks that can be recorded by voice actors in a studio will always continue to be; however, they represent a small fraction of all existent books. Even if we take finances out of the equation, there is just too much written content and not enough people and time to convert it all to speech.

Today, less than 1% of all books have an audio version, and the great majority of those that do are narrated in English. Our aim is to facilitate the conversion of all the books that would never be converted to audio without this kind of technology (the other 99%). With that, we hope to improve education worldwide, and help content creators and independent authors, especially those in less technologically developed countries. All in all, we are hoping that our product becomes an example of ethical AI.

My Contribution

As Co-Founder, I have been wearing a few different hats in this project: On the technical side, my role has been to create the backend infrastructure; this includes developing and implementing the required algorithms, and all the DevOps parafernalia: writing test suites, and managing CI-CD pipelines, cloud deployment, cloud storage, logging, and database management. On the business side, I am working with my colleagues to clarify our vision and find ways of actualizing it, managing the product, talking with potential clients, and riding the usual startup rollercoaster with all its ups and downs.

Current State of Affairs

Due to financial constraints, we were unable to fully develop our product for commercial use before needing to seek revenue sources (typical startup challenges...). Currently, the company is on hiatus. However, I plan to continue working on the product as a side project until it's ready for the market, using this time as an opportunity to enhance my DevOps skills. If all goes well, I'll then pursue a self-funded B2C approach.

Code & Tech Stack

I added a few code samples from the project's backend in this Github Repository. It is only meant for showcasing purposes. Here, I outline a few technologies that were used:

Python: Programming language.
FastAPI: Python web framework.
Pydantic: Python dependency used to create and validate DTOs.
OpenAPI/Swagger: Rest HTTP API specification format.
Docker Compose: For management of application containers.
Fly.io: Global application platform used to deploy the containers.
Beanie: An an asynchronous Python ODM for MongoDB.
PyMongo: The Python <‐> mongo driver.
Pipenv: Python dependency management tool.
Bash: Unix shell and command language.
AWS aioboto3: Async AWS SDK.
Github Actions: For CI/CD automation.
Codecov: Test coverage tool.
Betterstack: Log management tool.

Acknowledgments

I would like to thank Rui for sharing this journey of professional and personal growth with me, and João Nogueira for his companionship and impressive frontend skills.