Ep. 65: Klang.io CEO Sebastian Murgul on the Role of A.I. in Sheet Music
Episode Description:
Sebastian Murgul is the co-founder and CEO of klang.io, a music software company based in Germany that has developed a suite of tools, including Transcription Studio, Piano2Notes, and MelodyScanner, that use AI to transcribe audio recordings into PDF, MIDI, and MusicXML. We had a fantastic conversation about the underlying technology, as well as an honest conversation about what the future of the music industry might look like.
Featured On This Episode:
Sebastian Murgul
Sebastian Murgul is an inventor, entreprenuer, and the co-founder and CEO of klang.io.
Episode Transcript:
*Episode transcripts are automatically generated and have NOT been proofread.*
Hey, Gerrit.
Thanks for inviting me. I’m doing very well here.
Where are you based?
I’m based in Karlsruhe, Germany, which is pretty in the south of Germany next to the French border. So quite in the heart of Europe.
Now forgive the ignorant question, but it does seem like there’s a lot of music tech that happens in Germany. What do you think is the reason for that?
Oh, it’s very interesting. I think most of the music tech companies are based in, I think, the Berlin area or maybe Munich or Hamburg. Here in Baden-Württemberg, there aren’t that many music tech companies actually.
But it’s very interesting and I think it’s getting more and more. Maybe it’s the German engineering spirit that finally marries with music to create something great.
Well, like I said, maybe I’m just ignorant, but I’ve been noticing the last couple of years, there’s more and more tech startups and just energy happening in the tech space in Germany. So it’s just something I wanted to point out.
Yeah, we actually have a really… It’s like a group that organizes the companies in the music tech area here in Germany. It’s MusicTech Germany called.
And they are also starting to connect in a European way with MusicTech Europe. And I think they’re really helping to shape the music tech landscape here.
Well, and you’re right in the middle of that landscape, which is why I’m so excited to talk to you, because there’s so much going on just all over the place with AI, but especially in the music industry, or at least that’s what I care the most about.
And I think there’s this sort of equal parts, you know, fear and terror, but also excitement that surrounds this whole thing. Right. So why don’t we start with your company, clang.io.
What is it that you do?
So, what we are basically doing is something that maybe all of your listeners need or are very interested in, is we’re doing automatic music transcription using artificial intelligence.
And we are building several apps and tools around this to make the process as easy and as fast as possible.
And it’s a very interesting research area because when we started, I think it was back in 2017, 2018, there wasn’t that much of technology for transcribing music, especially in the AI field.
There wasn’t that much frameworks available to develop something like that. And it is really accelerating and getting faster and faster, and better and better also the results, which is really, really amazing.
And for us, we are right at the front and trying to do our best to accelerate the process further.
So how does that technology work?
Oh, so we’re just stepping right into the deep tech. Okay, yeah. So I think all of you are familiar with Spectrograms.
Just kidding.
Assume I know nothing.
Okay, yeah. Well, so we basically start with an audio signal, and it could be an audio file, a microphone recording, or whatever you like to transcribe. And then we do some sort of visualization for the artificial intelligence.
We have to make something that is processable. So in terms of music transcription or music analysis in general, we are typically dealing with these spectrograms, which are images, two-dimensional images.
The one-dimension is the time, which is the X-axis, and then you also have the frequency, which is the Y-axis, and you can imagine it like a colorful picture where you see these different lines of the notes blending into each other.
And yeah, you see in the brightness of the pixels of the picture, how much energy is in this frequency and time point. And this is basically how a computer can then understand the audio signal and try to figure out stuff that we want to extract.
And basically we’re doing three things. First of all, we’re doing rhythmic analysis. We’re trying to figure out the underlying beat grid to find the start of the bars, find pick up bars, define the tempo time signature.
Then we’re doing note tracking, so detecting all the notes that are played by a certain musical instrument. And third, we are doing some sort of harmonic analysis to find out the overall harmonic structure. So, for example, key signature and chords.
And then all of this gets fused to hopefully readable and playable sheet music, which is then output to the user.
Speaking just generally of AI, is there more than one method of transcribing audio, or is what you’re describing pretty much the standard that everyone’s trying to perfect?
I think there are several standards. Most of the AI apps out there are just performing an audio to MIDI transcription and then are using, I don’t know, maybe a MuseScore automatization to get some, at least half readable sheet music out of it.
And a lot of research is missing out the final past, but the last piece, the creation of something readable and playable, is, I think, the most important one, which is the most difficult, too, because it’s not about just finding out the exact seconds
time for the start and the end of a specific sound. It’s more about understanding the underlying musical structure and the intentions that the composer was trying to communicate in the musical piece.
And that’s something that is very intuitive for humans, but it’s very, very hard to model with an artificial intelligence.
Well, as I understand it, you have to essentially teach it to do two things, right? You have to teach it to understand the music, but then also how to write it.
And what I hear you saying is the second part is more difficult, which I wouldn’t necessarily think, right? Because I feel like there’s lots of available data on how you notate music, right? There’s lots of books, there’s lots of lessons, right?
That information is all out there. I feel like how to trans… To me anyway, the audio recognition seems like it would be harder.
Of course, the audio recognition is also still not easy, especially if you have multiple instruments playing at the same time.
For example, in a symphonic orchestra, this is really, really hard and we are far away from getting this sorted out.
But, for example, in a band context or something like that, it’s already quite possible to figure out the notes that were sung, that were played with the guitar, with the drums, with the bass.
And the other problem really is to get all of this information, to understand it in a musical way, to calculate out the performance part, because the performer brings his own or her own style into the music and is also doing some transformations far
away from the dry sheet music and bringing creativity. And we kind of have to remove this creativity and to calculate back to the original sheet music, which is a very interesting and not always unique result you can get.
So how is the AI, by the way, does your AI have a name?
We call it the Klangyo AI. Basically, it doesn’t have a name, but the servers that run the AI are called minions, like in Inspiceable Me.
A lot of small little servers that are actually really dumb, but still are helping the user to fulfill its master plan.
Well, and I meant to ask you this earlier, where did the name of the company come from? Klang?
Yeah. Originally, we were named Melody Scanner after our first product. But after some years, we started another application, another app called Piano2Notes.
And then we thought, hey, maybe it’s not the best thing to be just named after one product. We have to find something that is more than just one thing, but that covers more of the whole vision that we have for our company.
And then we thought about, hey, what are some cool words, maybe a German word, that kind of works internationally. And then we came to Klang, which is the German word for sound and IO input output. So we’re basically doing, yeah, sound processing.
Yeah, yeah.
No, it makes total sense. So going back to the training, is it, is it like that GPT, you know, where you’re just feeding it everything you can find? Is it more targeted or is it more based on recordings or sheet music or MIDI?
Like, how does, I’m assuming that the training is a large part of this development process. What does that look like?
Yeah, the training is a really hard and difficult part, especially when you’re based here in Europe, and because, especially here in Germany, foremost, because we have some really strict copyright law.
And yeah, using just everything you find around in the Internet isn’t quite the strategy that we wanted to do. And we also asked our lawyer, hey, how is it possible to train an AI to do what we are doing?
And the solution for us, at least, is we are using synthetic training examples that we artificially compose and artificially sonify and create basically in an automated DAW all the songs or the training data that we need to train our AI.
And then we just learn to do it backwards, which also has the advantage that all of our training data, all of the scores and the audio are perfectly in sync, which makes the training much more easier.
Like this is the part that my brain just can’t even comprehend, but is it a process of like, here’s a, you know, Clang, here’s a song, now try to write this song and then correcting the mistakes and then trying again?
Or is it analyzing a large data set and then trying some things and then like, what’s the back and forth like?
We created some subproblems that we wanted to solve, like the rhythm analysis or the note tracking.
And we train not only one end-to-end AI that does all of this in one step, we have an AI system that we call it, with several AI modules working together in an orchestra.
And each of the problems we are solving in a separate, especially for each musical instrument, because each musical instrument has its all tweaks and preferences, especially in notation.
And then we basically, maybe let’s start with beat detection or something like that, detecting the beat grid. Then we would just take a lot of musical recordings with the underlying beat annotations or the ground truth counts.
And then we show the AI, hey, that’s the audio and that’s the beat that we want to get out of you. Learn the connection model. What is the processing steps that you have to do in order to get from here to there.
And then you’re doing that. It’s typically you create a training data set with a huge amount of examples. And then you wait for maybe a day, a week.
Then the model is fully trained. And then you assess the quality of the model. And you find out, hey, did it learn what it should learn, or did it completely miss out the point and does something completely random?
I think maybe it’s like sending a kid to school and hoping that it learns the things that it should learn and then check in a test how good it performs.
So is it harder to teach Klang or a five-year-old how to play the piano?
I don’t have a five-year-old. And actually, I don’t play the piano, unfortunately. I play the guitar.
So I think teaching a five-year-old to play the piano is definitely harder for me.
Fair enough. Do you have to teach the AI to understand like this is a violin, this is a viola, this is a cello, or does it just need to understand that these are different notes?
Does the training need to be specific to each instrument, or is it just a question of understanding the layers that are involved in audio?
That’s a really good question. We did some experiment with instrument-wise recognition and function-wise recognition so that you basically either tell the AI, hear, these are the different instruments. This is how a note from a violin sounds like.
This is how a note from a piano sounds like. And then it learns how the timbres are different and can separate each layers like in source separation. Like there are a lot of tools that are already doing this.
Or if you say, hey, if you have a classical score, then you typically have these and these functions. Or in a rock band, for example, which is maybe a more, yeah, more direct or more known comparison.
If you have, for example, a rhythm guitar and a lead guitar. So you have, it doesn’t matter if the first one is an acoustic and the second one is an electric guitar. It’s more about what function it has in your rock song.
So the answer is, yeah, both, actually. It really depends on the output that you want. And that is the amazing thing, that there are many ways to get from A to B.
And before the whole deep learning world got crazy, there was a lot of just signal processing algorithms that were out there and were used.
So like just finding out which frequency is the loudest, that’s a pretty simple, straightforward way to transcribe a simple melody. Yeah.
And there are also other machine learning-based approaches, like clustering, where you think, hey, there are two instruments in my recording, figure out which could be the instruments.
So it really depends on your approach and on your output that you want to have.
What is the business model like for the company? How do you make money and how do you see that evolving as the software develops? Are you trying to scale and become a program that everybody uses?
Are you trying to license your technology to other companies? What do you see the future of the company being?
When we started out in 2018, we started with a subscription-based model with yearly and monthly plans for just our one app that we had. Then, it was MelodyScanner.
MelodyScanner, the idea behind that was to figure out the underlying melody that is played on whatever song you give into it.
But then we quickly realized, hey, the people that are using the app are mostly playing the piano, and they are also inputting a lot of complex piano pieces.
So we thought, hey, maybe we should add an app that transcribes one-to-one your piano pieces, and that’s the origin of Piano2Notes. Then Piano2Notes became quite a success for us with the same business model, also subscriptions.
We tried out with some ticket base so that you buy a bunch of transcription tickets that you can use whenever you need them.
But we quickly found out, hey, that’s just something that is too complex to communicate, so what we did instead is we added a single transcription purchase option, especially for the people here in Germany, because they are really not into giving
away credit card information or even using Paypal. They just want maybe something with an invoice, then they can send money to that, and then they not even want to create an account or something like that, and that’s something that we achieve with
the single transcriptions. Then from the piano app, we also started a guitar app, then came a singing app, drum app, violin, wind instruments, and by the end of last year, we started with a new tool, the Klangyo Transcription Studio, which sort of
combines all of that and enables multi-instrumental transcription, where you can just input your complete song and get all of the tracks sorted out for you. Beside that, we also have a plug-in for your DAW, which came out, I think, last week.
It’s now available on our website. You can just drag and drop tracks or sound. You can record using your gear or using the tracks in your DAW and get some unquantized media transcriptions right away.
And then you can just add another layer of VST on it. But we also started an API business. So you can also use our AI models because we were asked a lot, hey, is it somehow possible to access your AI in our product to somehow integrate it?
And therefore, we just decided, hey, we have the infrastructure, our AI is running anywhere. So we can just offer that to other apps. And so it became integrated in some learning apps, for example.
Yeah. And I think that’s where the business is heading for us. We have all of these transcription apps.
We have our API, which enables B2B customers. And our ultimate goal is to just provide the best AI-based music transcriptions in the world.
And what do you see the future of the music industry? If this technology all develops the way that you hope it will, what do you think the future looks like?
Hopefully, a lot more kids are reading sheet music, because they can play all their favorite songs and get the sheet music without having to hear it out or transcribe it their own.
I think the music industry becomes a little bit more digital in a way that you don’t need any sheet music printed out anymore. You don’t need or you don’t buy sheet music in PDF format or something static.
You get some digital sheet music that can evolve in the way that you like.
So it gets more interactive, which I think is also pretty nice for artists to just build another connection to their fans, maybe to play together, to evolve songs together, which I think is really amazing.
So hopefully, music doesn’t get forgotten anymore, but it is notated and engraved forever. And yeah, you can do whatever you want with it.
But that brings up some interesting questions, particularly for those of us that are music creators, and a lot of the people listening to this podcast are. There’s a lot of composers that listen to the show.
The benefits, I think, for us, are pretty clear, right? Because there is a fair amount of grunt work, if you will, involved in notating, and the AI can clearly help with that.
Yeah.
But if anyone can just take an audio recording and do whatever they want with our music, that does remove our ability to control how it gets used, it removes our ability to monetize it, potentially, you know?
And copyright law is not really equipped to deal with that. You know, on the audio side, you at least get the first recording, right? Like if you write a song, at least in the US, right?
You have the right to the first, you know, the first recording and then after that, it sort of becomes available for licensing and so on.
What would you say to artists that are concerned that this is going to take away their ability to control their music and how it gets used?
That’s a very fair point. Right now, of course, there is a big difference between edited professionally transcribed sheet music and what our AI can output.
Right now, we think more we are like an addition to the existing sheet music that is out there, that is professionally created just for you.
Of course, if you want to play your favorite song and the artist offers the sheet music, you definitely should buy it from them. But if you can’t find the sheet music out there, our tool is maybe the solution for your problem.
If the AI would get perfect and creates the same kind of quality than a human transcriber could create, I think all of that kind of blends. And as you already mentioned, copyright law has to evolve in some kind of way and needs to deal with that.
It is a big question mark to me, especially when you just not only look at one country, but at the whole world, how this is going to be unified. I’m very curious about that.
We, as I mentioned, we are currently training with synthetic training examples, but we already did some tests with real recordings.
Right now, I’m sitting in our recording studio here in our office, where we invite musicians and pay them to create training and tests, audio for us. And we already found out that this really leverages the quality that we can achieve.
So the next point for us is to go to the publishers, to go to the labels, and to make some sort of deal to find out, hey, you have the data, we are really interested in the data, and maybe we can use that to create the world’s best music
transcription AI. How do you think about that? And that’s something that I’m really interested in, and I’m really curious about how this will be happening and how it can be created in a fair way, so everyone gets his or her part of the cake.
Yeah, it’s an interesting question because the business model is already struggling. Well, I guess no one knows what it’s really going to do. Let me start with this.
Do you think the AI transcription will eventually become perfect?
I mean, do you think it’s realistic to assume that someday we’ll get to the point where we can just give it a recording of, you know, right of spring and it will notate everything perfectly? Or is that not realistic?
Is it going to hit a point where it can only do so much and that’s as good as it’s going to get?
So I have a little analogy. So when we started with the piano transcriptions, we found out, hey, the first problem that we have to solve is to get all the notes detected the right way.
Once we got some good accuracy for the note detection, our customers came to us and said, hey, how about piano pedals? And then we also tried to figure out how to deal with the pedal markings.
And I think it gets more and more complex the better it becomes, the system. And I think we are far away from from really solving the problem. So it’s really an active research topic.
And it’s also a really huge research community out there, which is typically working in the field of music, information retrieval. So finding out what is happening in some sound. And I’m not sure, I’m optimistic.
Maybe we will get there in like 10, 20 years, but I also see that there’s still a lot to do. And once you have are done with all your tasks, you get the next backlog with a lot of things that you have to consider next.
Yeah. Well, I think, I think that’s where a lot of the fear about AI comes from, right? Is that it will totally replace everything that musicians are doing.
And I think we kind of just assume, I kind of assume at least that we’ll eventually get there, you know, exponential growth and yada, yada, yada. And it’s not the first technology, certainly, that has caused musicians to lose work, right?
I mean, notation software probably put a lot of copies out of work. You know, sample libraries put a lot of studio musicians out of work.
It does seem to be the pattern, though, that every time there’s a new software, a new technology developed, that a lot of musicians lose work. I mean, I make a lot of my income personally by transcribing music for people.
I know that’s a very common thing in the professional music industry. Do you think the fear is overblown? I don’t know.
AI seems different than those other technologies that have developed.
Well, I think it’s just a scheme that you see in several business fields, not only in the music industry, but also, for example, if when you’re a software developer, there are a lot of AI coding tools out there that already are starting to replace
developers. And it’s some sort of disruption, but I think maybe there’s not much that you can do about it. It’s just more a shift. Your job gets transformed in another way.
You mentioned the sample libraries that got available. Okay, you have new jobs with the people that are creating these samples. So, it kind of shifts, but that’s also something where we draw a line.
We don’t want to replace the musicians, so we are not generating music, audio. We are just trying to make musicians’ lives easier. This is the point how I see it.
Will all of this training that you’re doing, teaching Klang to transcribe music, will that give it the tools to become a generative AI tool in the future and start being a composer now armed with all of this?
Every time we put in a song for them to transcribe, are we just increasing their ability to eventually compose music someday?
Maybe it will help the AI to arrange music. We also have some editing features in our software that help us figure out what our AI is doing wrong right now. It teaches the AI to transcribe or to arrange music, but not to compose.
I want to go back to what you said about the work shifting, because the point is well taken.
Finale was developed, for example. Finale is on the mind because it’s dying next week. We’re recording this right before support officially ends.
Finale developed, it put a lot of musicians out of work, but there was also a tool that they could learn how to use. Those hand copyists could theoretically learn how to use Finale and continue their work in a different way.
Same thing to a lesser extent with the sample libraries. I feel like the difference with AI is that the technology is so complex. How on earth am I supposed to figure out what’s going on to be able to use it?
It seems different in that way because the technology is so… It’s not just a question of learning a new program. It’s not a question of watching some YouTube tutorials and figuring out, okay, here’s how I use this.
It’s in such a different universe from how musicians are trained. And so my question to you is, if that work is going to shift, how do musicians prepare themselves to be able to take that on? What is the training that they need to have?
Where do they get the background and the understanding to be able to adapt, as you say?
Well, I think… You mentioned Finale, and of course it took some jobs, but it is not computers or a certain programming language or programming paradigm that is replacing the musicians. It is more…
It has to be captured in a program, in a tool that is made available to musicians. And that’s the same way with AI. It is not about will AI replace or musicians have to work with AI.
It’s like saying musicians have to learn how programming works. It’s more like the tool that developers like klang.io built for musicians to make use of AI. So maybe there are a lot of resources out there.
For example, there is a metal composer, Tristan Bierenz is his name, also from Germany, and he does a lot of cool stuff with AI and has a lot of cool videos on his YouTube channel.
So I would definitely start on YouTube and look at what kind of resources you find. Because one thing is really, really cool about AI. AI is democratizing the way how you can interact with computers.
Because now with AI, with ChetGBT, everyone can code. Everyone with ChetGBT can create a software, a mobile app or whatever you like. And that’s the thing that is awesome.
And I think you shouldn’t be afraid of AI. You should inform yourself about AI, find out what could be the possibles and just be creative, because that’s the thing that I think AI is not capable of being really creative.
It can find some middle way, combine building blocks, whatever. But the real creativity is for humans. And this is where you can come and shine, find out new ways how to use the AI.
Now, it’s too late for me, but if I was a student right now, if I was in school and I wanted to grow up and work for an AI music company, what is the path?
What are the things I would need to study? Is it computer science and music? Is it programming?
Is it something else? What are the qualifications that companies like yourself are looking for in applicants?
Maybe I can start by telling my own career path. I’m a hobby musician myself. I play the guitar.
I’m doing stuff with modular synthesizers. And my academic background is I studied electrical engineering, did my bachelor’s here, did my master’s, and I’m now pursuing the PhD, hoping to finish in the next month.
And we don’t really have some educational plans for the AI thing, because it’s evolving so fast, the universities don’t get behind it.
And it is good to have some ground qualifications, some basic math, linear algebra, and some basic programming skills. And then to find or to solve the missing parts with what you can find online.
Doing AI courses, learn how to use software tools like PyTorch. Do projects. There’s a lot of cool stuff on GitHub that you can just tinker around with it.
Maybe learn how to use the AI tools to help you code, to help you get a better music data scientist. And yeah, that’s actually the background of a lot of our employees. Right now, we are 14 in our teams.
Most of them are developers and AI researchers. And most of them either studied electrical engineering, computer science, or there’s also here in Karlskule something called music computer science.
Well, thank you for taking the time to talk to us. It’s been really enlightening to learn about everything that’s going on. And where can our listeners find you and your company?
Just go to www.klang.io and then you find all of the cool stuff we’re doing.
And if you’re really into music tech and are looking for a new job, just reach out to us. And if you have any feedback or thoughts on AI music transcription, just drop us a message really into the topic and like to chat with people.
Well, I appreciate you taking the time and excited to see where all of this goes.
Thank you, Gerrit. All right.
