BOSS: Bones, Organs and Skin Shape Model
BOSS: Bones, Organs and Skin Shape Model
Karthik Shetty1,2 Annette Birkhold2 Srikrishna Jaganathan1,2 Norbert Strobel2 Bernhard Egger1
Markus Kowarschik2 Andreas Maier1
1 2
FAU Erlangen-Nürnberg, Erlangen, Germany Siemens Healthineers AG, Forchheim, Germany
[email protected]
arXiv:2303.04923v1 [cs.CV] 8 Mar 2023
With the availability of a trained SMPL model, the registra- We optimize the following energy term
tion process is simplified due to the preexistence of a SSM.
Hence, the pose and shape can be optimized simultaneously arg min(Elm +Edata +Eθ +Eβ +Ew +Eh ),
by minimizing Elm + Edata , where Elm is the landmark- β,θ,t
(6)
loss (Eq. 4) and Edata the data-loss (Eq. 2).
to obtain an initial fit Ṽsi . The weight terms λx associated
with the energy term Ex is omitted for easier readability.
Edata = λd1 Ed (Sis , Ms ) + λd2 Ed (Ms , Sis ) (2) The initial fit is obtained by first by minimizing the global
translation t, followed by the pose θ, shape β and finally all
three parameters simultaneously.
X If the scans Sis were in a standing pose, registration
pj ρ ||mj − N (S)||22
Ed (S, M (β, θ, t; Φ)) = would be achieved under the assumption that the SMPL
mj ∈(M )
model represents an adequate space of human shape vari-
(3)
ations. However, the model we would like to represent are
X the ones in a supine position, taken during a CT procedure.
Elm = klj − Lj k1 (4)
This usually causes the backs to become flat, stomach to
lj ∈L(M )
be depressed and the chest bulged out. Hence, we perform
The data loss accounts for the distance between the skin non-rigid registration on the initial fit Ṽsi . Similar to the
SMPL model M and the surface Sis . As correspondence is works from [24,37], we represent a set of 3 × 4 affine trans-
not present implicitly, we select the nearest neighbour N in formation matrices Aij associated with each vertex of the
i
initial fit Ṽs,k , with the aim to align the vertices to the scan
i
Ss . This is achieved by minimizing Elm +Edata +Es +Eo ,
where Es and Eo are smoothing and orthogonality con-
straints [37] respectively. To achieve local rigidity, the
i
affine transformations applied on the vertices Ṽs,k need to
be close to the transformations on the neighbouring vertices
i i
Ṽs,k ∈ N (Ṽs,j ). Therefore, the smoothness term Es (Ṽsi )
can be defined as
2
X
Es (p) = cij kAj pj − Ak pk k2 . (7)
{j,k|{pj ,pk }∈edges(p)}
Here Rj is the closest projection of Aj onto the rotation of which 65, 617 vertices are made up of the bone section
matrix group. This can be extracted by performing Singu- and the remaining for the internal organs.
lar Value Decomposition on the transformation matrix. All On top of the template, we define a kinematics chain
energies are minimized to obtain a final non-rigid fit Vsi us- made up of Nb = 63 joints comprising 63 segments.
ing a gradient-based LBFGS [39] minimization method and Though we start with 70 individual segments, we consider
make use of automatic differentiation packages. femur-patella and all cervical vertebrae as combined seg-
The main advantage of the two step process of registra- ments. Linear blend skinning is adopted on femur-patella-
tion is that it can handle scans with missing data or holes. tibia and cervical section to achieve a smooth deformable
Missing data here refers to the non-availability of scans sec- bone model. The blend weights are set to 1 for the rigid
tions such as the arms or legs from the partial-body dataset. entity with respect to their own segments, whereas for the
Missing data is identified by the non-availability of land- composite structure it is evenly distributed between the par-
marks for a given scan. As the SMPL model is divided into ent and child segment. The initial blend weights for the
24 sections, we can prevent pose deformation and data loss organs are set only with respect to the vertebral section. We
minimization on those sections. Using a SSM reduces the make use of Blender [41] to automatically generate these
search space, and can provide shape in the realm of proba- weights.
ble human shapes. We rigidly deform the vertebra of the bone-organ mesh
model to one of the segmented CT volumes of comparable
2.4. Bone-Organ Model shape and size, such that the mesh represents a person lay-
Unlike the skin model, publically available SSMs for ing in supine pose. Similarly, we define the skin template
bones and organs do not exist. For this purpose we create a T̃s in supine pose by re-posing the non-rigid skin model to
deformable model from scratch. A template mesh is derived a T-pose of the same CT volume. Additionally, we also ro-
from an existing polygon data BodyParts3D, which was ex- tate the arms and legs of the bone-organ model, such that
tracted from full body MRI images [40]. The bone model is they lay inside and follow the same T-pose as the skin from
made up of 70 segments, including skull, femur, humerus, the SMPL model. The final template bone-organ template
forearm, lower leg, scapula, clavicle, sternum, hands, feet, mesh T̃bo is shown in Fig. 2 .
vertebra, ribs, and pelvis. The organ model includes lungs,
liver, kidneys, spleen, heart, bladder, rectum, esophagus, 2.4.1 Bone Registration
and aorta. We also incorporate the bowel region containing
the stomach and intestines. However, segmentation for the The registration process in general follows the methodology
individual bowel components are not available, rather a hull as described in Sec. 2.3. However, estimating the rough
enclosing the stomach and intestines. The entire bone-organ pose followed by non-rigid registration is not feasible by
template is made up of 104, 546 vertices and 209, 418 faces, virtue of the complex thin structure of bones. This problem
is particularly evident on the scapula and clavicle, which generate an estimated fit of the bones in the case of missing
leads to incorrect poses for the rest of the template. To ad- data around the leg and arm regions. Here, we use a learnt
dress this, we simultaneously estimate a rough shape and mapping between the skin vertices and joints of skeletons
pose. The shape variations are achieved by applying a scale J (Ms ), in particular for the arms, legs, hand and feet. The
transform along a segment in world coordinates. Conse- mapping is learnt on the set of registered scans where afore-
quently, the joint locations along the kinematic chain are mentioned sections where present in the CT data. Using the
also scaled by the same amount. registered skin as a reference, we minimize the loss between
Hereby a simplistic deformable bone model can be ex- the predicted joint locations J (Ms ) and the joint locations
pressed as M b (β̂ b , θ b , tb ; Φb ), where β̂ b ∈ R63×3 repre- obtained from the skeleton model M b (β b , θ b ).
sents the scaling parameters, θ b ∈ R63×3 represents the
pose parameters, tb ∈ R63×3 represents the individual seg- X
ment translation parameters, and Φb represent the model Edata = (λd1 Ed (smk , ssk ) + λd2 Ed (ssk , smk ))
parameters comprising of the kinematic chain and the ini- {smk ,ssk ∈segments(Mb ,Sb )}
tial blend weights. (10)
Sn
sents the normalized direction vector for a pair of neigh- X
i arg min ||P(Ūibo , Uis ) − P(V̄bo
i
, Vsi )|| (14)
bouring vertices. We further add constraints on θ̃ s , Js , Ws i i
θ̃ bo ,θ̄ bo ,Jbo ,Wbo i
and BPs in the form of L2 loss, to not deviate too much
from its initial values. In the original SMPL model, the joint We start off by initialising the joint regressor Jbo on the
regressor was computed using non-negative least squares, template bone-organ model Tbo based on the joint locations
with a constraint that the weights add up to 1. We maintain from its initial kinematic chain. The joints are always lo-
similar setting during the joint optimization by normalizing cated between 2 bone segments. We randomly sample 50
PC1 PC2 PC3
−2σ 2σ −2σ 2σ −2σ 2σ
Figure 5. The first three principal components of body shape are shown, varying about 2 standard deviations after normalizing the variance.
One could infer that the height and weight of the patient are mostly explained by the first two components
closest vertices to the joint from both segments where the weightage to reduce mesh interpenetration. Symmetry reg-
vertex normals approximately faces the vertex-joint direc- ularizer is applied only to the bone structures. Regularizing
tion. The joint regressor Jbo is learnt using a least square the joint locations of bone and skin along the arms and legs
fit for the sampled vertices. The rest pose vertices Uibo are are done by minimizing the following equation:
i i
initialized M bo (β̂ b ) with only the scale parameters β̂ b ob-
tained from the initial rigid bone fit. Similar to the skin ||Jbo (Uibo ) − Js (Uis )||2 .
i
model, the blend weights Wbo and poses θ̃ bo are initialized. We alternate between optimizing Eq. 13 and Eq. 14,
During unposing of the bone-organ model, we define that while carrying the optimized parameters between them. For
i
the motion of ribs, sternum and pelvis are a function of optimization of Eq. 14, we initialize the pose θ̄ bo with zero,
shape rather than a function of the pose. Hence, we ini- and regularize them towards zero. While alternating to op-
tialize the pose to zero for these particular segments. Note timize Eq. 13, we initialize unposed vertices Uibo with the
that, only the ribs and sternum are leaf nodes in our kine- obtained vertices Ūibo after optimizing Eq. 14.
matic chain, i.e. there are no child segments. However, for
the femur, we additionally include the pelvis rotation, as it 2.5.3 Shape Space
is its parent node.
From the unposed skin Uis and bone-organ Uibo volume, we
To stabilize the optimization we make use of similar reg-
learn the shape components with the aid of mean and prin-
ularizes defined in the skin model for both the objectives
cipal shape components. We do not have complete registra-
functions. For the edge loss, we additionally incorporate
tions around skulls, arms and legs for some of the volume.
the virtual edges from the registration process with lower
Hence, we use a publicly available implementation 1 of
Probabilistic Principal Component Analysis (PPCA) [42],
which can handle missing data. By performing PPCA, we
obtain a mean skin Tsµ , bone-organ Tboµ and vertex offsets
to the mean in the form of shape space for skin Bsµ and Bbo
µ
bone-organ.
9 D U L D Q F H
3. Evaluation
In Fig. 5, we visualize the first three shape components,
while Fig. 6 displays the cumulative variance of the full
&