I know what you are saying, “Whoa that sounds technical; this isn’t for me,” but stick with me, this is FOR YOU! Last week, I gave a technical talk on the DeepSeek-R1 Large Language Model (LLM), and I realized most of my excitement about the techniques it enabled is non-technical. I’m getting ahead of myself though; let’s start from the beginning.
DeepSeek-R1 was released in January with a big splash from Chinese research company Big Flyer. Reasoning models are different from traditional LLMs because they not only can answer questions, but they have been trained to think a problem through. When an LLM thinks through a problem and has an inner monologue, they make less mistakes, especially on problems that require connecting multiple concepts together or calculations. Deepseek-R1 was a significant release for many reasons, but most importantly:
- This is the first reasoning model to be released as open source, allowing us many avenues for customizing it.
- It is dramatically less expensive than the OpenAI o1 model (~5% of the cost) yet maintaining similar capabilities.
- The model focused on automated Reinforcement Learning, reducing the manual Fine Tuning needed.
- Distillations were a primary focus of the research with a method available on release.
The cost of deploying Large Language Models in an enterprise environment already can become expensive when scaled across an entire user base, but reasoning models are dramatically more expensive. Let’s look at an example, starting with Open AI’s GPT-4o.
OpenAI GPT-4o
Let’s pretend that we are deploying a solution to our entire user base. Each interaction is about 300 input tokens and 200 output tokens, and each user has 100 interactions per month. The monthly cost per user for this is $0.275. Imagine that our user base is small and is only 1000 users. The total LLM cost for this solution is $275.
OpenAI o1
Let’s now pretend that we realize we need a reasoning model to get the desired behavior. You would imagine that the cost is now seven times the cost to support the whole user base; however, it isn’t that simple. Reasoning models think out loud and tend to generate a lot more tokens. Let’s imagine that we ask the same question, and the entire interaction is still 300 input tokens, but now we generate 1000 output tokens. Doing all the math, our monthly cost for this solution is $6,450. Whoa!
DeepSeek-R1
DeepSeek-R1 is able to provide similar functionality to OpenAI o1 at a fraction of the cost. Doing all the math again, our cost for this solution is $236!
The price difference is stunning between DeepSeek and OpenAI for their reasoning models. Sadly, it doesn’t tell the whole story for an enterprise. Because of the additional size of the model the additional time spent thinking through the problem, it can take a long time for the model to execute. Often, reasoning models take many minutes to complete a response. This will often make the solution out of reach, taking too much time for reasonable response times across a large user base. I believe the high response time is the reason that we don’t see reasoning models in use for a lot of custom solutions yet. As a user, I don’t care what the cost is, but if I have to wait minutes for a response, I’m going to be upset, even if the solution is solving something that might take me longer to solve.
Distillation is the final piece of the puzzle, and while it does dramatically lower costs, it also lowers execution time. I’m getting ahead of myself again, but we’re almost there!
Model distillation is where we take a very large model and distill the knowledge and reasoning capabilities into a Small Language Model (SLM). For DeepSeek-R1, the full model is 671 billion parameters; a popular distillation of this model is 32 billion parameters. Not only does this lower the cost proportionally, but it also lowers the execution times proportionately. A response that takes 5 minutes to generate now takes 14 seconds on the same hardware! That may sound too good to be true, and truthfully, it is. When we distill that larger model into a SLM, we lose some accuracy and result quality. The neural network is 5% the size of the full model, which means some things are going to be lost and some details will disappear from its memory. We’ve landed on fast, cheap, and mediocre quality, not exactly enticing just yet…
What’s the missing piece?
The missing piece of the puzzle is that we can create our own distillations of DeepSeek-R1. Why would we want to create a custom distillation? Imagine creating your own distillation of DeepSeek, but instead of trying to distill all of the knowledge of the original model into a smaller model, we distill only the business-specific knowledge to solve your specific problem and discard the rest of the knowledge. In fact, we can go a step farther and add in domain-specific knowledge that wasn’t in the original model and still maintain the reasoning capabilities of the original model. This distillation would only maintain the relevant knowledge of your business and even add in additional domain knowledge. Now we have fast, cheap, and excellent quality but only in the problem we are trying to solve.
This is what has me excited about DeepSeek-R1; with it, we can solve problems that were previously out of reach due to cost or execution time and solve them with a high-quality reasoning model that is far superior to any regular LLM!
If you would like to learn more about how we can apply cutting edge Reasoning Models or any other form of AI / ML, reach out to me at cvetter@newresources.com; I’d love to chat and understand how we can help you save time and money with Artificial Intelligence.


