There are two main things with potential to take up CPU:
378 soliders * 19 bones + 90 buildings * 2 bones = 7362 bones. Each of these needs to have the transform calculated, see Bone#updateWorldTransform. It's a couple method calls, about 8 multiplies, and a few adds. With 7k bones, it adds up.
Animation#apply has to apply each timeline for each skeleton. There will be a timeline for each keyed bone rotation, scale, translation, keyed slot attachment change, keyed slot color key change, keyed draw order, and keyed event. On average you probably have at least as many keys as bones, maybe 1.5 times. So now you have almost 11k timelines. For each it uses a binary search to find which key is for the current and next frames, then interpolate the value between them and apply it to the bone/slot/etc.
To take up less CPU, you could of course use fewer bones. However, with so many soldiers, maybe you can have them share both an animation and skeleton. Eg, for simplicity imagine you have a single skeleton for walking and each frame you apply the walk animation to it. Any soldier walking is given this skeleton. When you go to draw, you just need to draw the walking skeleton at the soldier's location. This is waaaay more efficient because you only apply an animation to a single skeleton and you only compute the world SRT of a single skeleton.
Of course it looks bad because all soldiers are animated the same. So, now you use a pool of skeletons which you randomly assign when a soldier begins walking. You could use maybe 5-20, whatever looks good. Maybe even do it dynamically, eg if you have 100 soldiers in the same state, use 10 animations. This would reduce your CPU by 90%. I have a feeling no one is going to notice that 400 soliders don't have perfectly unique animation. 🙂