We recently gathered some Practical AI listeners for a live webinar with Danny from LibreChat to discuss the future of private, open source chat UIs. During the discussion we hear about the motivations behind LibreChat, why enterprise users are hosting their own chat UIs, and how Danny (and the LibreChat community) is creating amazing features (like RAG and plugins).
Daniel Whitenack: So we started exploring LibreChat at Prediction Guard because a bunch of our customers who are using Prediction Guard wanted a private chat interface, because Prediction Guard itself is a platform that allows you to run large language models in a private secure environment, with safeguards around them for like factuality and toxicity, and prompt injections, and a bunch of other things. And so our customers are all those kind of privacy-focused, security-conscious customers who are maybe running Prediction Guard either on their own infrastructure and want a private chat interface for the models that they’re hosting with Prediction Guard, or they want an interface that’s not a closed one for usage of our models. And so here, what you can see is we’ve taken LibreChat, which again, Danny mentioned is open source, and we’ve been able to take it into our kind of branding… And we have Prediction Guard here where you can set your API key, and use Prediction Guard running on top of our platform. And because it’s open source, because it’s transparent, we are able to take this and also integrate our own sort of flair into this.
I know an engineer from our team, Ed and Danny work together, so thanks for that, where we were able to integrate some of these checks for like toxicity, and integrate our various models into the mix. So still, kind of like Danny was showing in terms of running here - I’m running with Neural-Chat 7B; this is running in a privacy-conserving setup in Intel’s AI cloud on Gaudi 2 infrastructure. So it’s a very unique setup that we’ve kind of optimized… And we’re able to connect to our own model, use this really slick interface, which is LibreChat, it’s just sort of branded a bit with our colors and logos and that sort of thing… But also, we can integrate the unique features of our take on an AI system, right? So let’s say I’m really concerned – because I’m using an open model that doesn’t have some of the guardrails around it like closed source models, I can go into the config here and turn on a toxicity filter to make sure that the model isn’t cursing me out, or giving me any sort of like stuff that I don’t want to see. And so here you can see we have a little toxicity score… Thankfully, it wasn’t very toxic this time around. So continuing… Similar to what Danny was showing, but again, our kind of own take on that, with our models, and kind of the safeguards around that.
[00:22:13.08] One cool thing that we’ve found really useful is that a lot of our customers, they want an interface like this, but they also want it authenticated, so they have their system setup… So we’ve integrated – we’re at G Suite company, so we’ve integrated Google login here… And it’s only our org that can log in, so the Prediction Guard org, and now I’m authenticated. Here’s my chat, like Danny mentioned, that is private and searchable…
So yeah, this has been a really amazing thing for us, where we’ve been able to take and build on the great open source stuff that Danny has built at LibreChat, and create something that works really well for our customers and for our setup. So before I leave and stop screen sharing, I saw that there was a question earlier on about translation with language models. A lot of what we’ve been showing is English; some language models like Open AI, they say that they’ll do other languages, but sometimes that doesn’t always work out.
So we have a translate endpoint in our API, and so we’ve done a bit of this testing with large language models translation, and kind of standard translation systems like Google Translate, and Bing Translate, and others… Or even other models, like NLLB, No Language was Left Behind from Meta. And in our translate endpoint, you can send a translation and then actually get the results along with the score. So we’re using COMET scoring, which is a way to score translations… And I think the question was how well do large language models translate and are able to chat in different languages versus machine translating with a commercial translation system.
So what we’ve seen in scoring, both commercial translation systems and large language models, is that some large language models, depending on the language - like, if you’re going into Hindi, with Open AI, you might get a good translation, or one that is comparable to Google Translate, a small amount of the time, like 5% to 10%. But mostly, the commercial translation systems are generally better. And definitely, as you go down the longer tail of languages, it gets sort of worse and worse. Even in chat in like Mandarin a lot of models don’t do so good, even though that’s kind of the next highest represented language in datasets out there. So yeah, it’s definitely a mixed bag there. I don’t know if Danny or Chris, if you have a comment on that before we go to other questions, but…