<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://rjauslin.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://rjauslin.github.io/" rel="alternate" type="text/html" /><updated>2025-05-21T07:18:43+00:00</updated><id>https://rjauslin.github.io/feed.xml</id><title type="html">Raphaël Jauslin</title><subtitle>Raphael&apos;s webpage</subtitle><author><name>Raphaël Jauslin</name></author><entry><title type="html">Password generator</title><link href="https://rjauslin.github.io/mdp/" rel="alternate" type="text/html" title="Password generator" /><published>2025-05-20T00:00:00+00:00</published><updated>2025-05-20T00:00:00+00:00</updated><id>https://rjauslin.github.io/generateur-mdp</id><content type="html" xml:base="https://rjauslin.github.io/mdp/"><![CDATA[<h2 id="password">Password</h2>

<div style="position: relative; padding-bottom: 65%; height: 0; overflow: hidden;">
  <iframe src="https://mdp-uvt3.onrender.com" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border: none;">
  </iframe>
</div>]]></content><author><name>Raphaël Jauslin</name></author><summary type="html"><![CDATA[Password generator]]></summary></entry><entry><title type="html">Deville systematic</title><link href="https://rjauslin.github.io/deville/" rel="alternate" type="text/html" title="Deville systematic" /><published>2022-07-06T00:00:00+00:00</published><updated>2022-07-06T00:00:00+00:00</updated><id>https://rjauslin.github.io/deville</id><content type="html" xml:base="https://rjauslin.github.io/deville/"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>The Deville systematic design is a sampling method developed in 1998 by Jean-Claude Deville. While it shares similarities with systematic sampling, it has distinct properties. <a href="https://doi.org/10.3150/11-BEJ380">Chauvet (2012)</a> demonstrated that Deville systematic sampling and the ordered pivotal method are actually the same underlying sampling design.</p>

<p>This vignette explains how to use the functions <code class="language-plaintext highlighter-rouge">sys_deville</code> and <code class="language-plaintext highlighter-rouge">sys_devillepi2</code> and includes a small simulation to verify that the second-order inclusion probabilities align with those calculated by the function <code class="language-plaintext highlighter-rouge">spm</code> from the <code class="language-plaintext highlighter-rouge">BalancedSampling</code> package, which implements the ordered pivotal method.</p>

<h2 id="generating-data">Generating Data</h2>

<p>Inclusion probabilities are generated unequally and are proportional to a random uniform variable.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">sampling</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">StratifiedSampling</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">BalancedSampling</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">set.seed</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">N</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">20</span><span class="w">
</span><span class="n">n</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">3</span><span class="w">
</span><span class="n">pik</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">inclusionprobabilities</span><span class="p">(</span><span class="n">runif</span><span class="p">(</span><span class="n">N</span><span class="p">),</span><span class="n">n</span><span class="p">)</span></code></pre></figure>

<h2 id="simulations">Simulations</h2>

<p>To verify whether the function correctly computes second-order inclusion probabilities, we perform a large number of simulations to estimate the second-order inclusion probability matrix.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">SIM</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">100000</span><span class="w">
</span><span class="n">PI_1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w">  </span><span class="n">PI_2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w">  </span><span class="n">matrix</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">N</span><span class="o">*</span><span class="n">N</span><span class="p">),</span><span class="n">ncol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">N</span><span class="p">,</span><span class="n">nrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">N</span><span class="p">)</span><span class="w">

</span><span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">1</span><span class="o">:</span><span class="n">SIM</span><span class="p">){</span><span class="w">
  
  </span><span class="n">s1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">BalancedSampling</span><span class="o">::</span><span class="n">spm</span><span class="p">(</span><span class="n">pik</span><span class="p">)</span><span class="w">
  </span><span class="n">s1_01</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">N</span><span class="p">)</span><span class="w">
  </span><span class="n">s1_01</span><span class="p">[</span><span class="n">s1</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">1</span><span class="w">
  
  </span><span class="n">s2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sys_deville</span><span class="p">(</span><span class="n">pik</span><span class="p">)</span><span class="w">
  </span><span class="n">s2_01</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="n">N</span><span class="p">)</span><span class="w">
  </span><span class="n">s2_01</span><span class="p">[</span><span class="n">s2</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">1</span><span class="w">
  </span><span class="n">PI_1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">PI_1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s1_01</span><span class="o">%*%</span><span class="n">t</span><span class="p">(</span><span class="n">s1_01</span><span class="p">)</span><span class="w">
  </span><span class="n">PI_2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">PI_2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s2_01</span><span class="o">%*%</span><span class="n">t</span><span class="p">(</span><span class="n">s2_01</span><span class="p">)</span><span class="w">

</span><span class="p">}</span><span class="w">

</span><span class="n">PI_1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">PI_1</span><span class="o">/</span><span class="n">SIM</span><span class="w">
</span><span class="n">PI_2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">PI_2</span><span class="o">/</span><span class="n">SIM</span></code></pre></figure>

<h2 id="exact-matrix-of-second-order-inclusion">Exact matrix of second order inclusion</h2>
<p>The function <code class="language-plaintext highlighter-rouge">sys_devillepi2</code> computes the exact second order inclusion probabilities.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">PI</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sys_devillepi2</span><span class="p">(</span><span class="n">pik</span><span class="p">)</span><span class="w"> </span><span class="c1"># compute the second order inclusion probabilities</span></code></pre></figure>

<h2 id="results">Results</h2>

<p>We visualize and compare the second-order inclusion probability matrices.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">PI_1_sp</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">as</span><span class="p">(</span><span class="n">as.matrix</span><span class="p">(</span><span class="n">PI_1</span><span class="p">),</span><span class="s2">"sparseMatrix"</span><span class="p">)</span><span class="w">
</span><span class="n">PI_2_sp</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">as</span><span class="p">(</span><span class="n">as.matrix</span><span class="p">(</span><span class="n">PI_2</span><span class="p">),</span><span class="s2">"sparseMatrix"</span><span class="p">)</span><span class="w">
</span><span class="n">PI_sp</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">as</span><span class="p">(</span><span class="n">as.matrix</span><span class="p">(</span><span class="n">PI</span><span class="p">),</span><span class="s2">"sparseMatrix"</span><span class="p">)</span><span class="w">

</span><span class="n">image</span><span class="p">(</span><span class="n">PI_1_sp</span><span class="p">)</span></code></pre></figure>

<p><img src="/figs/sysDev/unnamed-chunk-4-1.png" alt="center" /></p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">image</span><span class="p">(</span><span class="n">PI_2_sp</span><span class="p">)</span></code></pre></figure>

<p><img src="/figs/sysDev/unnamed-chunk-4-2.png" alt="center" /></p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">image</span><span class="p">(</span><span class="n">PI_sp</span><span class="p">)</span></code></pre></figure>

<p><img src="/figs/sysDev/unnamed-chunk-4-3.png" alt="center" /></p>

<h2 id="accuracy-test">Accuracy test</h2>

<p>To test accuracy, we verify that the estimated probabilities closely match the expected values.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># proportional test, these values should be approximately to 0.95</span><span class="w">
</span><span class="nf">length</span><span class="p">(</span><span class="w"> </span><span class="n">which</span><span class="p">(</span><span class="nf">abs</span><span class="p">((</span><span class="n">PI_1_sp</span><span class="o">@</span><span class="n">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">PI_sp</span><span class="o">@</span><span class="n">x</span><span class="p">)</span><span class="o">/</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">PI_sp</span><span class="o">@</span><span class="n">x</span><span class="o">*</span><span class="p">(</span><span class="m">1</span><span class="o">-</span><span class="n">PI_sp</span><span class="o">@</span><span class="n">x</span><span class="p">)</span><span class="o">/</span><span class="n">SIM</span><span class="p">))</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="m">1.96</span><span class="p">))</span><span class="o">/</span><span class="nf">length</span><span class="p">(</span><span class="n">PI_sp</span><span class="o">@</span><span class="n">x</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt; [1] 0.961039</code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="nf">length</span><span class="p">(</span><span class="w"> </span><span class="n">which</span><span class="p">(</span><span class="nf">abs</span><span class="p">((</span><span class="n">PI_2_sp</span><span class="o">@</span><span class="n">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">PI_sp</span><span class="o">@</span><span class="n">x</span><span class="p">)</span><span class="o">/</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">PI_sp</span><span class="o">@</span><span class="n">x</span><span class="o">*</span><span class="p">(</span><span class="m">1</span><span class="o">-</span><span class="n">PI_sp</span><span class="o">@</span><span class="n">x</span><span class="p">)</span><span class="o">/</span><span class="n">SIM</span><span class="p">))</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="m">1.96</span><span class="p">))</span><span class="o">/</span><span class="nf">length</span><span class="p">(</span><span class="n">PI_sp</span><span class="o">@</span><span class="n">x</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt; [1] 0.9448052</code></pre></figure>

<h2 id="references">References</h2>

<p>Chauvet, G. (2012), On a characterization of ordered pivotal sampling, <em>Bernoulli</em>, 18(4):1320-1340
DOI: <a href="https://doi.org/10.3150/11-BEJ380">https://doi.org/10.3150/11-BEJ380</a></p>]]></content><author><name>Raphaël Jauslin</name></author><summary type="html"><![CDATA[Deville's systematic second-order inclusion probabilities.]]></summary></entry><entry><title type="html">Sequential balanced sampling</title><link href="https://rjauslin.github.io/balseq/" rel="alternate" type="text/html" title="Sequential balanced sampling" /><published>2022-06-06T00:00:00+00:00</published><updated>2022-06-06T00:00:00+00:00</updated><id>https://rjauslin.github.io/sequential_balanced</id><content type="html" xml:base="https://rjauslin.github.io/balseq/"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>Balanced sampling plays a crucial role in applied statistics. In this vignette, we explain how to use the <code class="language-plaintext highlighter-rouge">balseq</code> function to select a balanced and spatially distributed sample. For a detailed explanation of the method, refer to <a href="https://doi.org/10.1002/env.2776">doi:10.1002/env.2776</a>.</p>

<h2 id="loading-data">Loading Data</h2>

<p>We will use the <code class="language-plaintext highlighter-rouge">belgianmunicipalities</code> dataset from the <code class="language-plaintext highlighter-rouge">sampling</code> package, which does not contain spatial coordinates. Fortunately, a <code class="language-plaintext highlighter-rouge">GEOjson</code> file is available on <a href="https://hub.arcgis.com/datasets/esribeluxdata::belgium-municipalities-1/about">ArcGIS Hub</a>. We transform it into an <code class="language-plaintext highlighter-rouge">sf</code> object and compute the municipalities’ centroids to distribute the sample across space.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># to load data</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sampling</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">geojsonio</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w"> 
</span><span class="n">library</span><span class="p">(</span><span class="n">viridis</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rgeos</span><span class="p">)</span><span class="w"> 
</span><span class="n">library</span><span class="p">(</span><span class="n">sf</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rmapshaper</span><span class="p">)</span><span class="w">

</span><span class="n">data</span><span class="p">(</span><span class="s2">"belgianmunicipalities"</span><span class="p">)</span><span class="w">

</span><span class="c1"># load geojson directly from the url</span><span class="w">
</span><span class="n">belg</span><span class="w">  </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">geojson_read</span><span class="p">(</span><span class="s2">"https://opendata.arcgis.com/datasets/9589d9e5e5904f1ea8d245b54f51b4fd_0.geojson"</span><span class="p">,</span><span class="n">what</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"sp"</span><span class="p">)</span><span class="w">

</span><span class="c1"># simplify the variable and transform it into a sf object</span><span class="w">
</span><span class="n">belg</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rmapshaper</span><span class="o">::</span><span class="n">ms_simplify</span><span class="p">(</span><span class="n">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">belg</span><span class="p">,</span><span class="n">keep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.01</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w">
  </span><span class="n">st_as_sf</span><span class="p">()</span><span class="w">

</span><span class="n">coord</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">gCentroid</span><span class="p">(</span><span class="n">as</span><span class="p">(</span><span class="n">belg</span><span class="p">,</span><span class="w"> </span><span class="s2">"Spatial"</span><span class="p">),</span><span class="w"> </span><span class="n">byid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">

</span><span class="c1"># concatenated file</span><span class="w">
</span><span class="n">Belgium</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">belg</span><span class="p">,</span><span class="n">belgianmunicipalities</span><span class="p">,</span><span class="n">coord</span><span class="p">)</span><span class="w">
</span><span class="n">head</span><span class="p">(</span><span class="n">Belgium</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt; Simple feature collection with 6 features and 26 fields
#&gt; Geometry type: MULTIPOLYGON
#&gt; Dimension:     XY
#&gt; Bounding box:  xmin: 4.216306 ymin: 51.08066 xmax: 4.564865 ymax: 51.37764
#&gt; Geodetic CRS:  WGS 84
#&gt;   OBJECTID   ADMUNAFR   ADMUNADU   ADMUNAGE   Communes CODE_INS arrond    Commune   INS Province Arrondiss
#&gt; 1        1 AARTSELAAR AARTSELAAR AARTSELAAR Aartselaar    11001     11 Aartselaar 11001        1        11
#&gt; 2        2     ANVERS  ANTWERPEN  ANTWERPEN  Antwerpen    11002     11     Anvers 11002        1        11
#&gt; 3        3   BOECHOUT   BOECHOUT   BOECHOUT   Boechout    11004     11   Boechout 11004        1        11
#&gt; 4        4       BOOM       BOOM       BOOM       Boom    11005     11       Boom 11005        1        11
#&gt; 5        5   BORSBEEK   BORSBEEK   BORSBEEK   Borsbeek    11007     11   Borsbeek 11007        1        11
#&gt; 6        6 BRASSCHAAT BRASSCHAAT BRASSCHAAT Brasschaat    11008     11 Brasschaat 11008        1        11
#&gt;    Men04 Women04  Tot04  Men03 Women03  Tot03 Diffmen Diffwom DiffTOT TaxableIncome Totaltaxation averageincome
#&gt; 1   6971    7169  14140   7010    7243  14253     -39     -74    -113     242104077      74976114         33809
#&gt; 2 223677  233642 457319 221767  232405 454172    1910    1237    3147    5416418842    1423715652         22072
#&gt; 3   6027    5927  11954   6005    5942  11947      22     -15       7     167616996      50739035         29453
#&gt; 4   7640    8066  15706   7535    7952  15487     105     114     219     186075961      46636930         21907
#&gt; 5   4948    5328  10276   4951    5322  10273      -3       6       3     143225590      40564374         26632
#&gt; 6  18142   18916  37058  18217   18903  37120     -75      13     -62     533368826     153629397         30574
#&gt;   medianincome        x        y                       geometry
#&gt; 1        23901 4.382005 51.13223 MULTIPOLYGON (((4.400451 51...
#&gt; 2        17226 4.369578 51.26067 MULTIPOLYGON (((4.368136 51...
#&gt; 3        21613 4.516463 51.16576 MULTIPOLYGON (((4.530071 51...
#&gt; 4        17537 4.371434 51.09387 MULTIPOLYGON (((4.385267 51...
#&gt; 5        20739 4.488162 51.19138 MULTIPOLYGON (((4.509002 51...
#&gt; 6        21523 4.500281 51.30940 MULTIPOLYGON (((4.540815 51...</code></pre></figure>

<h2 id="data-visualization">Data visualization</h2>

<p>We visualize Belgian municipalities with their average income using ggplot2.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">p</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">()</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Belgium</span><span class="p">,</span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">averageincome</span><span class="p">),</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.1</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">scale_fill_viridis_c</span><span class="p">(</span><span class="n">option</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"G"</span><span class="p">)</span><span class="w">
</span><span class="n">p</span></code></pre></figure>

<p><img src="/figs/sequential_balanced/unnamed-chunk-3-1.png" title="center" alt="center" width="100%" /></p>

<h2 id="inclusion-probabilites">Inclusion probabilites</h2>

<p>A good sample should maintain the population’s characteristics. By defining proportional inclusion probabilities, we ensure better representativity. We set up here the inclusion probabilities equal with sum equal to 50. i.e. the sample will contain 50 units.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">N</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">Belgium</span><span class="p">)</span><span class="w"> </span><span class="c1"># population total</span><span class="w">
</span><span class="n">n</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">50</span><span class="w"> </span><span class="c1"># sample size</span><span class="w">

</span><span class="c1"># variable of interest</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">belgianmunicipalities</span><span class="o">$</span><span class="n">averageincome</span><span class="w">

</span><span class="c1"># auxiliary variables</span><span class="w">
</span><span class="n">Xaux</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">belgianmunicipalities</span><span class="o">$</span><span class="n">Tot04</span><span class="p">,</span><span class="w">
              </span><span class="n">belgianmunicipalities</span><span class="o">$</span><span class="n">Women04</span><span class="p">,</span><span class="w">
              </span><span class="n">belgianmunicipalities</span><span class="o">$</span><span class="n">TaxableIncome</span><span class="p">,</span><span class="w">
              </span><span class="n">belgianmunicipalities</span><span class="o">$</span><span class="n">Diffmen</span><span class="p">,</span><span class="w">
              </span><span class="n">belgianmunicipalities</span><span class="o">$</span><span class="n">Diffwom</span><span class="p">)</span><span class="w">


</span><span class="c1"># inclusion probabilities</span><span class="w">
</span><span class="n">pik</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="n">n</span><span class="o">/</span><span class="n">N</span><span class="p">,</span><span class="n">N</span><span class="p">)</span><span class="w">

</span><span class="n">Xaux</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">pik</span><span class="p">,</span><span class="n">Xaux</span><span class="p">)</span><span class="w"> </span><span class="c1"># add pik to fixed sample size</span><span class="w">
</span><span class="n">Xspread</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">coord</span></code></pre></figure>

<h2 id="balanced-sampling">Balanced sampling</h2>

<p>We compare balanced sampling <code class="language-plaintext highlighter-rouge">balseq</code> with two other methods:  <code class="language-plaintext highlighter-rouge">samplecube</code> and simple random sampling <code class="language-plaintext highlighter-rouge">srswor</code>. The percentage deviation from auxiliary totals helps evaluate the balance quality.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">StratifiedSampling</span><span class="p">)</span><span class="w">

</span><span class="n">s</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">balseq</span><span class="p">(</span><span class="n">pik</span><span class="p">,</span><span class="n">Xaux</span><span class="p">)</span><span class="w">
</span><span class="n">s_cube</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">samplecube</span><span class="p">(</span><span class="n">Xaux</span><span class="p">,</span><span class="n">pik</span><span class="p">,</span><span class="n">comment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="n">s_srs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">srswor</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="n">N</span><span class="p">)</span><span class="w">


</span><span class="n">TOT</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">colSums</span><span class="p">(</span><span class="n">Xaux</span><span class="p">)</span><span class="w">
</span><span class="n">EST1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">colSums</span><span class="p">(</span><span class="n">Xaux</span><span class="p">[</span><span class="n">s</span><span class="p">,]</span><span class="o">/</span><span class="n">pik</span><span class="p">[</span><span class="n">s</span><span class="p">])</span><span class="w">
</span><span class="n">EST2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">colSums</span><span class="p">(</span><span class="n">Xaux</span><span class="p">[</span><span class="n">s_cube</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">,]</span><span class="o">/</span><span class="n">pik</span><span class="p">[</span><span class="n">s_cube</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">])</span><span class="w">
</span><span class="n">EST3</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">colSums</span><span class="p">(</span><span class="n">Xaux</span><span class="p">[</span><span class="n">s_srs</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">,]</span><span class="o">/</span><span class="n">pik</span><span class="p">[</span><span class="n">s_srs</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">])</span><span class="w">


</span><span class="m">100</span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">EST1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">TOT</span><span class="p">)</span><span class="o">/</span><span class="n">TOT</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;       pik                                                   
#&gt;  0.000000  4.908396  4.890552  6.873691 14.225213  6.440032</code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="m">100</span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">EST2</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">TOT</span><span class="p">)</span><span class="o">/</span><span class="n">TOT</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;        pik                                                        
#&gt;   0.000000  -4.068363  -4.153367  -8.105537 -19.212958 -14.508554</code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="m">100</span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="n">EST3</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">TOT</span><span class="p">)</span><span class="o">/</span><span class="n">TOT</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;       pik                                                   
#&gt;   0.00000  12.99101  12.52005  12.72183 -13.28123 -21.77427</code></pre></figure>

<h2 id="spread-sampling">Spread sampling</h2>

<p>To incorporate spatial distribution, we use geographic coordinates as a matrix to the argument <code class="language-plaintext highlighter-rouge">Xspread</code> of the function. Here <code class="language-plaintext highlighter-rouge">coord</code> is an output of the function <code class="language-plaintext highlighter-rouge">gCentroid</code> which is by construction an <code class="language-plaintext highlighter-rouge">S4</code> object. To get the <code class="language-plaintext highlighter-rouge">data.frame</code> that are encapsulated inside, we simply use the <code class="language-plaintext highlighter-rouge">@coords</code> operator.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">s</span><span class="w"> </span><span class="o">&lt;-</span><span class="w">  </span><span class="n">balseq</span><span class="p">(</span><span class="n">pik</span><span class="p">,</span><span class="w">
             </span><span class="n">Xaux</span><span class="p">,</span><span class="w">
             </span><span class="n">Xspread</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.matrix</span><span class="p">(</span><span class="n">Xspread</span><span class="o">@</span><span class="n">coords</span><span class="p">))</span><span class="w">
</span><span class="n">p</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_point</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Belgium</span><span class="p">,</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.9</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Belgium</span><span class="p">[</span><span class="n">s</span><span class="p">,],</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"red"</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">scale_fill_viridis_c</span><span class="p">(</span><span class="n">option</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"G"</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">p</span></code></pre></figure>

<p><img src="/figs/sequential_balanced/unnamed-chunk-6-1.png" title="center" alt="center" width="100%" /></p>]]></content><author><name>Raphaël Jauslin</name></author><summary type="html"><![CDATA[Example of sequential spatially balanced sampling.]]></summary></entry><entry><title type="html">Wave Sampling</title><link href="https://rjauslin.github.io/wave/" rel="alternate" type="text/html" title="Wave Sampling" /><published>2021-10-19T09:11:51+00:00</published><updated>2021-10-19T09:11:51+00:00</updated><id>https://rjauslin.github.io/wave</id><content type="html" xml:base="https://rjauslin.github.io/wave/"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>Geographical data are generally auto-correlated, making it preferable to avoid sampling neighboring units. We introduce a new method for selecting well-spread samples from a finite spatial population with equal or unequal inclusion probabilities. The proposed method, called <code class="language-plaintext highlighter-rouge">wave</code> (Weakly Associated Vectors), defines the contiguity structure using dense stratification. This method precisely satisfies inclusion probabilities while providing well-spread samples. This document serves as an introduction to using the <code class="language-plaintext highlighter-rouge">wave()</code> function.</p>

<h2 id="data-generation">Data Generation</h2>

<p>We use the <code class="language-plaintext highlighter-rouge">meuse</code> dataset from the <code class="language-plaintext highlighter-rouge">sp</code> package, described as follows:</p>
<blockquote>
  <p><em>This dataset provides locations and topsoil heavy metal concentrations, along with various soil and landscape variables recorded at observation locations in a floodplain of the river Meuse, near Stein (NL).</em></p>
</blockquote>

<p>As explained by <a href="https://doi.org/10.1002/env.2194">Grafström and Tillé (2013)</a>, we generate inclusion probabilities proportional to copper concentration, a highly spatially correlated variable.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">sp</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sf</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sampling</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">WaveSampling</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="s2">"meuse"</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="s2">"meuse.riv"</span><span class="p">)</span><span class="w">
</span><span class="n">meuse.riv</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">meuse.riv</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">meuse.riv</span><span class="p">[,</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="m">334200</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">meuse.riv</span><span class="p">[,</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="m">329400</span><span class="p">),]</span><span class="w">
</span><span class="n">meuse_sf</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">st_as_sf</span><span class="p">(</span><span class="n">meuse</span><span class="p">,</span><span class="w"> </span><span class="n">coords</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"x"</span><span class="p">,</span><span class="w"> </span><span class="s2">"y"</span><span class="p">),</span><span class="w"> </span><span class="n">crs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">28992</span><span class="p">,</span><span class="w"> </span><span class="n">agr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"constant"</span><span class="p">)</span><span class="w">

</span><span class="n">X</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">scale</span><span class="p">(</span><span class="n">as.matrix</span><span class="p">(</span><span class="n">meuse</span><span class="p">[,</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">]))</span><span class="w">
</span><span class="n">pik</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">inclusionprobabilities</span><span class="p">(</span><span class="n">meuse</span><span class="o">$</span><span class="n">copper</span><span class="p">,</span><span class="m">30</span><span class="p">)</span></code></pre></figure>

<h2 id="sample-selection">Sample selection</h2>

<p>We perform sample selection easily using the <code class="language-plaintext highlighter-rouge">wave()</code> function.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">s</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">wave</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">pik</span><span class="p">)</span><span class="w">
</span><span class="nf">sum</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="w">
</span><span class="c1">#&gt; [1] 30</span></code></pre></figure>

<h2 id="data-visualization">Data visualization</h2>

<p>The selected sample is visualized using <code class="language-plaintext highlighter-rouge">ggplot2</code>.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="n">p</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ggplot</span><span class="p">()</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_sf</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">meuse_sf</span><span class="p">,</span><span class="n">aes</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">copper</span><span class="p">),</span><span class="n">show.legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'point'</span><span class="p">,</span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="n">stroke</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.3</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_polygon</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">meuse.riv</span><span class="p">[,</span><span class="m">1</span><span class="p">],</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">meuse.riv</span><span class="p">[,</span><span class="m">2</span><span class="p">]),</span><span class="w">
               </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w">
               </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lightskyblue2"</span><span class="p">,</span><span class="w">
               </span><span class="n">colour</span><span class="o">=</span><span class="w"> </span><span class="s2">"grey50"</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">meuse</span><span class="p">,</span><span class="w">
             </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">copper</span><span class="p">),</span><span class="w">
             </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w">
             </span><span class="n">stroke</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.3</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">meuse</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">),],</span><span class="w">
             </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">copper</span><span class="p">),</span><span class="w">
             </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Longitude"</span><span class="p">,</span><span class="w">
       </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Latitude"</span><span class="p">,</span><span class="w">
       </span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">,</span><span class="w">
       </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Copper"</span><span class="p">,</span><span class="w">
       </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">scale_size</span><span class="p">(</span><span class="n">range</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0.5</span><span class="p">,</span><span class="w"> </span><span class="m">3.5</span><span class="p">))</span><span class="o">+</span><span class="w">
  </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w">
</span><span class="n">p</span></code></pre></figure>

<p><img src="/figs/2021-10-19-wave/unnamed-chunk-3-1.png" alt="center" /></p>

<h2 id="spatial-balance">Spatial balance</h2>

<h3 id="voronoï-polygons">Voronoï polygons</h3>
<p>One way of measuring the spread of a sample was developed by  <a href="https://doi.org/10.1198/016214504000000250">Stevens Jr. and Olsen (2004)</a> and then suggested by <a href="https://doi.org/10.1111/j.1541-0420.2011.01699.x">Grafström et al. (2012)</a>. It is based on the Voronoï polygons and is given by</p>

\[B(\bf s) = \frac{1}{n}\sum_{i \in s} (v_i - 1)^2\]

<p>where \(v_i\) is equal to the sum of the inclusion probabilities inside the \(i\)th polygons and \(\bf s\) is the vector of size \(N\) with elements equal 0 or 1. This quantity is implemented in the package <code class="language-plaintext highlighter-rouge">BalancedSampling</code> with the function <code class="language-plaintext highlighter-rouge">sb()</code>. We calculate the values of the \(v_k\) with the function <code class="language-plaintext highlighter-rouge">sb_vk</code>.</p>

<p>The closer \(B(\bf s)\) is to zero, the better is the spatial balance of the sample. Graphically, we obtain the following plot.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">sp</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sampling</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggvoronoi</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">data</span><span class="p">(</span><span class="s2">"meuse"</span><span class="p">)</span><span class="w">
</span><span class="n">data</span><span class="p">(</span><span class="s2">"meuse.area"</span><span class="p">)</span><span class="w">

</span><span class="n">v</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sb_vk</span><span class="p">(</span><span class="n">pik</span><span class="p">,</span><span class="n">as.matrix</span><span class="p">(</span><span class="n">meuse</span><span class="p">[,</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">]),</span><span class="n">s</span><span class="p">)</span><span class="w">
</span><span class="n">meuse</span><span class="o">$</span><span class="n">v</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">v</span><span class="w">

</span><span class="n">p</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_voronoi</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">meuse</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">),],</span><span class="w">
               </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span><span class="p">),</span><span class="w">
               </span><span class="n">outline</span><span class="w"> </span><span class="o">=</span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">meuse.area</span><span class="p">),</span><span class="w">
               </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.1</span><span class="p">,</span><span class="w">
               </span><span class="n">colour</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"black"</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">meuse</span><span class="p">,</span><span class="w">
             </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">copper</span><span class="p">),</span><span class="w">
             </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w">
             </span><span class="n">stroke</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.3</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_point</span><span class="p">(</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">meuse</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">),],</span><span class="w">
             </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">copper</span><span class="p">),</span><span class="w">
             </span><span class="n">shape</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">16</span><span class="p">)</span><span class="o">+</span><span class="w">
  </span><span class="n">scale_fill_gradient2</span><span class="p">(</span><span class="n">midpoint</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">p</span></code></pre></figure>

<p><img src="/figs/2021-10-19-wave/unnamed-chunk-4-1.png" alt="center" /></p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">BalancedSampling</span><span class="o">::</span><span class="n">sb</span><span class="p">(</span><span class="n">pik</span><span class="p">,</span><span class="n">as.matrix</span><span class="p">(</span><span class="n">meuse</span><span class="p">[,</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">]),</span><span class="n">which</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">))</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt; [1] 0.0910097</code></pre></figure>

<h3 id="moran-index">Moran index</h3>

<p>Another way to estimate the spatial spread is developed by <a href="https://doi.org/10.1016/j.spasta.2018.02.001">Tillé et al. (2018)</a>, it uses a corrected version of the traditional Moran’s \(I\) index. This estimator use spatial weights \(w_{ij}\) that indicates how a unit \(i\) is close from the unit \(j\). Such matrix is supposed to include inclusion probabilities in its computation, hence, the spatial weights matrix \(\bf W\) is generally not symmetric. The spatial balance measure is given by</p>

\[I_B =\frac{(\bf s-\bf \bar{s}_w)^\top \bf W (\bf s-\bf \bar{s}_w)}{\sqrt{(\bf s-\bf \bar{s}_w)^\top \bf D (\bf s-\bf \bar{s}_w) (\bf s-\bf \bar{s}_w)^\top \bf B (\bf s-\bf \bar{s}_w) }},\]

<p>where \(\bf D\) is the diagonal matrix containing the \(w_i\),</p>

\[\bf \bar{s}_w = \bf 1 \frac{\bf s^\top \bf W \bf 1}{\bf 1^\top \bf W \bf 1},\]

<p>and</p>

\[\bf B = \bf W^\top \bf D^{-1} \bf W - \frac{\bf W^\top \bf 1\bf 1^\top \bf W}{\bf1^\top \bf W \bf 1}.\]

<p>The Moran’s \(I\) index is implemented in the function <code class="language-plaintext highlighter-rouge">IB()</code>. It is possible to specify your own spatial weights with the argument <code class="language-plaintext highlighter-rouge">W</code>. There is no natural way of defining \(\bf W\), here we propose to consider for each unit only the neighbour such that the sum of the inclusion probabilities of the stratum sum up to 1. It is implemented in the function <code class="language-plaintext highlighter-rouge">wpik()</code>. Another way of estimating the spatial weights is developed by <a href="https://doi.org/10.1016/j.spasta.2018.02.001">Tillé et al. (2018)</a> and use the inverse of the inclusion probabilities \(1/\pi_i\) to estimate the neighbours of the unit \(i\). It is implemented in the function <code class="language-plaintext highlighter-rouge">wpikInv()</code>. As explain by <a href="https://doi.org/10.1016/j.spasta.2018.02.001">Tillé et al. (2018)</a>, \(w_{ii}\) is supposed to be equal to 0 for all \(i \in U\). By construction the function <code class="language-plaintext highlighter-rouge">wpik</code> does not return the diagonal equal to zero. So if we want to calculate the Moran’s I index with <code class="language-plaintext highlighter-rouge">wpik</code>, we need to subtract the diagonal of the returned matrix.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">W</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">wpik</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">pik</span><span class="p">)</span><span class="w">
</span><span class="n">W</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">W</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">diag</span><span class="p">(</span><span class="n">diag</span><span class="p">(</span><span class="n">W</span><span class="p">))</span><span class="w">
</span><span class="n">IB</span><span class="p">(</span><span class="n">W</span><span class="p">,</span><span class="n">s</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt; [1] -0.4895601</code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">W1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">wpikInv</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="n">pik</span><span class="p">)</span><span class="w">
</span><span class="n">IB</span><span class="p">(</span><span class="n">W1</span><span class="p">,</span><span class="n">s</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt; [1] -0.4554427</code></pre></figure>

<h2 id="references">References</h2>

<p>Grafström, A., Lundström, N. L. P., and Schelin, L., (2012). Spatially balanced sampling through the pivotal method, <em>Biometrics</em>, 68(2):514-520
DOI: <a href="https://doi.org/10.1111/j.1541-0420.2011.01699.x">https://doi.org/10.1111/j.1541-0420.2011.01699.x</a></p>

<p>Grafström, A. and Tillé, Y., Doubly balanced spatial sampling with spreading and restitution of auxiliary totals, <em>Environmetrics</em>, 14(2):120-131
DOI: <a href="https://doi.org/10.1002/env.2194">https://doi.org/10.1002/env.2194</a></p>

<p>Stevens Jr., D.L. and Olsen, A. R. (2004), Spatially balanced sampling of natural resources. <em>Journal of the American Statistical Association</em>, 99(465):262-278
DOI: <a href="https://doi.org/10.1198/016214504000000250">https://doi.org/10.1198/016214504000000250)</a></p>

<p>Tillé, Y., Dickson, M. M., Espa, G., and Giuliani, D. (2018). Measuring the spatial balance of a sample: A new measure based on Moran’s I index, <em>Spatial Statistics</em>, 23:182-192
DOI: <a href="https://doi.org/10.1016/j.spasta.2018.02.001">https://doi.org/10.1016/j.spasta.2018.02.001</a></p>]]></content><author><name>Raphaël Jauslin</name></author><summary type="html"><![CDATA[Example of weakly associated vector on the Meuse dataset.]]></summary></entry><entry><title type="html">Statistical Matching using Optimal Transport</title><link href="https://rjauslin.github.io/matching/" rel="alternate" type="text/html" title="Statistical Matching using Optimal Transport" /><published>2021-10-19T00:00:00+00:00</published><updated>2021-10-19T00:00:00+00:00</updated><id>https://rjauslin.github.io/ot-matching</id><content type="html" xml:base="https://rjauslin.github.io/matching/"><![CDATA[<h2 id="introduction">Introduction</h2>

<p>In this vignette, we explore how key functions from the package can be used to estimate a contingency table. Our analysis is based on the <code class="language-plaintext highlighter-rouge">eusilc</code> dataset from the <code class="language-plaintext highlighter-rouge">laeken</code> package. Each function discussed here is thoroughly explained in the manuscript by Raphaël Jauslin and Yves Tillé (2021), available on <a href="https://doi.org/10.1016/j.jspi.2022.12.003">doi:10.1016/j.jspi.2022.12.003</a>.</p>

<h2 id="contingency-table">Contingency Table</h2>

<p>To construct the contingency table, we examine the factor variable <code class="language-plaintext highlighter-rouge">pl030</code>, which represents economic status, in combination with a discretized version of the equivalized household income, <code class="language-plaintext highlighter-rouge">eqIncome</code>. The discretization process involves calculating specific percentiles (0.15, 0.30, 0.45, 0.60, 0.75, 0.90) of <code class="language-plaintext highlighter-rouge">eqIncome</code> and defining categorical intervals based on these values.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">laeken</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">sampling</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">StratifiedSampling</span><span class="p">)</span></code></pre></figure>

<!-- 
<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt; Warning: le package 'StratifiedSampling' a été compilé avec la version R 4.2.0</code></pre></figure>
 -->

<!-- 
<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt; Le chargement a nécessité le package : Matrix</code></pre></figure>

 -->

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">data</span><span class="p">(</span><span class="s2">"eusilc"</span><span class="p">)</span><span class="w">
</span><span class="n">eusilc</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">na.omit</span><span class="p">(</span><span class="n">eusilc</span><span class="p">)</span><span class="w">
</span><span class="n">N</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">nrow</span><span class="p">(</span><span class="n">eusilc</span><span class="p">)</span><span class="w">


</span><span class="c1"># Xm are the matching variables and id are identity of the units</span><span class="w">
</span><span class="n">Xm</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">eusilc</span><span class="p">[,</span><span class="nf">c</span><span class="p">(</span><span class="s2">"hsize"</span><span class="p">,</span><span class="s2">"db040"</span><span class="p">,</span><span class="s2">"age"</span><span class="p">,</span><span class="s2">"rb090"</span><span class="p">,</span><span class="s2">"pb220a"</span><span class="p">)]</span><span class="w">
</span><span class="n">Xmcat</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">do.call</span><span class="p">(</span><span class="n">cbind</span><span class="p">,</span><span class="n">apply</span><span class="p">(</span><span class="n">Xm</span><span class="p">[,</span><span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">5</span><span class="p">)],</span><span class="n">MARGIN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="n">FUN</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">disjunctive</span><span class="p">))</span><span class="w">
</span><span class="n">Xm</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">Xmcat</span><span class="p">,</span><span class="n">Xm</span><span class="p">[,</span><span class="o">-</span><span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">5</span><span class="p">)])</span><span class="w">
</span><span class="n">id</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">rb030</span><span class="w">


</span><span class="c1"># categorial income splitted by the percentile</span><span class="w">
</span><span class="n">c_income</span><span class="w">  </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w">
</span><span class="n">q</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">quantile</span><span class="p">(</span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="p">,</span><span class="w"> </span><span class="n">probs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">0.15</span><span class="p">))</span><span class="w">
</span><span class="n">c_income</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="m">2</span><span class="p">])]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"(0,15]"</span><span class="w">
</span><span class="n">c_income</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">q</span><span class="p">[</span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="m">3</span><span class="p">])]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"(15,30]"</span><span class="w">
</span><span class="n">c_income</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">q</span><span class="p">[</span><span class="m">3</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="m">4</span><span class="p">])]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"(30,45]"</span><span class="w">
</span><span class="n">c_income</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">q</span><span class="p">[</span><span class="m">4</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="m">5</span><span class="p">])]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"(45,60]"</span><span class="w">
</span><span class="n">c_income</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">q</span><span class="p">[</span><span class="m">5</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="m">6</span><span class="p">])]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"(60,75]"</span><span class="w">
</span><span class="n">c_income</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="n">q</span><span class="p">[</span><span class="m">6</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="m">7</span><span class="p">])]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"(75,90]"</span><span class="w">
</span><span class="n">c_income</span><span class="p">[</span><span class="n">which</span><span class="p">(</span><span class="w">  </span><span class="n">eusilc</span><span class="o">$</span><span class="n">eqIncome</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="m">7</span><span class="p">]</span><span class="w"> </span><span class="p">)]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"(90,100]"</span><span class="w">

</span><span class="c1"># variable of interests</span><span class="w">
</span><span class="n">Y</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">ecostat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eusilc</span><span class="o">$</span><span class="n">pl030</span><span class="p">)</span><span class="w">
</span><span class="n">Z</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">c_income</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c_income</span><span class="p">)</span><span class="w">

</span><span class="c1"># put same rownames</span><span class="w">
</span><span class="n">rownames</span><span class="p">(</span><span class="n">Xm</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rownames</span><span class="p">(</span><span class="n">Y</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rownames</span><span class="p">(</span><span class="n">Z</span><span class="p">)</span><span class="o">&lt;-</span><span class="w"> </span><span class="n">id</span><span class="w">

</span><span class="n">YZ</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">table</span><span class="p">(</span><span class="n">cbind</span><span class="p">(</span><span class="n">Y</span><span class="p">,</span><span class="n">Z</span><span class="p">))</span><span class="w">
</span><span class="n">addmargins</span><span class="p">(</span><span class="n">YZ</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;        c_income
#&gt; ecostat (0,15] (15,30] (30,45] (45,60] (60,75] (75,90] (90,100]   Sum
#&gt;     1      409     616     722     807     935    1025      648  5162
#&gt;     2      189     181     205     184     165     154       82  1160
#&gt;     3      137      90      72      75      59      52       33   518
#&gt;     4      210     159     103      95      74      49       46   736
#&gt;     5      470     462     492     477     459     435      351  3146
#&gt;     6       57      25      28      30      17      11       10   178
#&gt;     7      344     283     194     149     106      91       40  1207
#&gt;     Sum   1816    1816    1816    1817    1815    1817     1210 12107</code></pre></figure>

<h2 id="sampling-schemes">Sampling schemes</h2>

<p>Here we set up the sampling designs and define all the quantities we will need for the rest of the vignette. The sample is selected with simple random sampling without replacement and the weights are equal to the inverse of the inclusion probabilities.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># size of sample</span><span class="w">
</span><span class="n">n1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">1000</span><span class="w">
</span><span class="n">n2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">500</span><span class="w">

</span><span class="c1"># samples</span><span class="w">
</span><span class="n">s1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">srswor</span><span class="p">(</span><span class="n">n1</span><span class="p">,</span><span class="n">N</span><span class="p">)</span><span class="w">
</span><span class="n">s2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">srswor</span><span class="p">(</span><span class="n">n2</span><span class="p">,</span><span class="n">N</span><span class="p">)</span><span class="w">
  
</span><span class="c1"># extract matching units</span><span class="w">
</span><span class="n">X1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">Xm</span><span class="p">[</span><span class="n">s1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">,]</span><span class="w">
</span><span class="n">X2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">Xm</span><span class="p">[</span><span class="n">s2</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">,]</span><span class="w">
  
</span><span class="c1"># extract variable of interest</span><span class="w">
</span><span class="n">Y1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">Y</span><span class="p">[</span><span class="n">s1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">,])</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">Y1</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">colnames</span><span class="p">(</span><span class="n">Y</span><span class="p">)</span><span class="w">
</span><span class="n">Z2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">as.data.frame</span><span class="p">(</span><span class="n">Z</span><span class="p">[</span><span class="n">s2</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">,])</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">Z2</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">colnames</span><span class="p">(</span><span class="n">Z</span><span class="p">)</span><span class="w">
  
</span><span class="c1"># extract correct identities</span><span class="w">
</span><span class="n">id1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">id</span><span class="p">[</span><span class="n">s1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">]</span><span class="w">
</span><span class="n">id2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">id</span><span class="p">[</span><span class="n">s2</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">]</span><span class="w">
  
</span><span class="c1"># put correct rownames</span><span class="w">
</span><span class="n">rownames</span><span class="p">(</span><span class="n">Y1</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">id1</span><span class="w">
</span><span class="n">rownames</span><span class="p">(</span><span class="n">Z2</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">id2</span><span class="w">
  
</span><span class="c1"># here weights are inverse of inclusion probabilities</span><span class="w">
</span><span class="n">d1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="n">N</span><span class="o">/</span><span class="n">n1</span><span class="p">,</span><span class="n">n1</span><span class="p">)</span><span class="w">
</span><span class="n">d2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="n">N</span><span class="o">/</span><span class="n">n2</span><span class="p">,</span><span class="n">n2</span><span class="p">)</span><span class="w">
  
</span><span class="c1"># disjunctive form</span><span class="w">
</span><span class="n">Y_dis</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sampling</span><span class="o">::</span><span class="n">disjunctive</span><span class="p">(</span><span class="n">as.matrix</span><span class="p">(</span><span class="n">Y</span><span class="p">))</span><span class="w">
</span><span class="n">Z_dis</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">sampling</span><span class="o">::</span><span class="n">disjunctive</span><span class="p">(</span><span class="n">as.matrix</span><span class="p">(</span><span class="n">Z</span><span class="p">))</span><span class="w">
  
</span><span class="n">Y1_dis</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">Y_dis</span><span class="p">[</span><span class="n">s1</span><span class="w"> </span><span class="o">==</span><span class="m">1</span><span class="p">,]</span><span class="w">
</span><span class="n">Z2_dis</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">Z_dis</span><span class="p">[</span><span class="n">s2</span><span class="w"> </span><span class="o">==</span><span class="m">1</span><span class="p">,]</span></code></pre></figure>

<h2 id="harmonization">Harmonization</h2>

<p>Then the harmonization step must be performed. The <code class="language-plaintext highlighter-rouge">harmonize</code> function returns the harmonized weights. If the true population totals are known, it is possible to use these instead of the estimate made within the function.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">re</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">harmonize</span><span class="p">(</span><span class="n">X1</span><span class="p">,</span><span class="n">d1</span><span class="p">,</span><span class="n">id1</span><span class="p">,</span><span class="n">X2</span><span class="p">,</span><span class="n">d2</span><span class="p">,</span><span class="n">id2</span><span class="p">)</span><span class="w">  

</span><span class="c1"># if we want to use the population totals to harmonize we can use </span><span class="w">
</span><span class="n">re</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">harmonize</span><span class="p">(</span><span class="n">X1</span><span class="p">,</span><span class="n">d1</span><span class="p">,</span><span class="n">id1</span><span class="p">,</span><span class="n">X2</span><span class="p">,</span><span class="n">d2</span><span class="p">,</span><span class="n">id2</span><span class="p">,</span><span class="n">totals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">colSums</span><span class="p">(</span><span class="n">Xm</span><span class="p">)))</span><span class="w">

</span><span class="n">w1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">re</span><span class="o">$</span><span class="n">w1</span><span class="w">
</span><span class="n">w2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">re</span><span class="o">$</span><span class="n">w2</span><span class="w">

</span><span class="n">colSums</span><span class="p">(</span><span class="n">Xm</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;      1      2      3      4      5      6      7      8      9     10     11 
#&gt;    476    887   2340    763   1880   1021   2244   1938    558   6263   5844 
#&gt;     12     13     14  hsize    age 
#&gt;  11073    283    751  36380 559915</code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">colSums</span><span class="p">(</span><span class="n">w1</span><span class="o">*</span><span class="n">X1</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;      1      2      3      4      5      6      7      8      9     10     11 
#&gt;    476    887   2340    763   1880   1021   2244   1938    558   6263   5844 
#&gt;     12     13     14  hsize    age 
#&gt;  11073    283    751  36380 559915</code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">colSums</span><span class="p">(</span><span class="n">w2</span><span class="o">*</span><span class="n">X2</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;      1      2      3      4      5      6      7      8      9     10     11 
#&gt;    476    887   2340    763   1880   1021   2244   1938    558   6263   5844 
#&gt;     12     13     14  hsize    age 
#&gt;  11073    283    751  36380 559915</code></pre></figure>

<h2 id="optimal-transport-matching">Optimal transport matching</h2>

<p>The statistical matching is done by using the <code class="language-plaintext highlighter-rouge">otmatch</code> function. The estimation of the contingency table is calculated by extracting the <code class="language-plaintext highlighter-rouge">id1</code> units (respectively <code class="language-plaintext highlighter-rouge">id2</code> units) and by using the function <code class="language-plaintext highlighter-rouge">tapply</code> with the correct weights.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Optimal transport matching</span><span class="w">
</span><span class="n">object</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">otmatch</span><span class="p">(</span><span class="n">X1</span><span class="p">,</span><span class="n">id1</span><span class="p">,</span><span class="n">X2</span><span class="p">,</span><span class="n">id2</span><span class="p">,</span><span class="n">w1</span><span class="p">,</span><span class="n">w2</span><span class="p">)</span><span class="w">
</span><span class="n">head</span><span class="p">(</span><span class="n">object</span><span class="p">[,</span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">])</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;         id1    id2    weight
#&gt; 702     702 251702 11.509002
#&gt; 1      1401   1401 13.550397
#&gt; 2506   2506 194004  8.315938
#&gt; 2506.1 2506 324205  2.013395
#&gt; 3001   3001 494702 10.976034
#&gt; 3602   3602 503002 12.651816</code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">Y1_ot</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">X1</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">object</span><span class="o">$</span><span class="n">id1</span><span class="p">),],</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Y1</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">object</span><span class="o">$</span><span class="n">id1</span><span class="p">),])</span><span class="w">
</span><span class="n">Z2_ot</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">X2</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">object</span><span class="o">$</span><span class="n">id2</span><span class="p">),],</span><span class="n">z</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Z2</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">object</span><span class="o">$</span><span class="n">id2</span><span class="p">),])</span><span class="w">
</span><span class="n">YZ_ot</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tapply</span><span class="p">(</span><span class="n">object</span><span class="o">$</span><span class="n">weight</span><span class="p">,</span><span class="nf">list</span><span class="p">(</span><span class="n">Y1_ot</span><span class="o">$</span><span class="n">y</span><span class="p">,</span><span class="n">Z2_ot</span><span class="o">$</span><span class="n">z</span><span class="p">),</span><span class="n">sum</span><span class="p">)</span><span class="w">

</span><span class="c1"># transform NA into 0</span><span class="w">
</span><span class="n">YZ_ot</span><span class="p">[</span><span class="nf">is.na</span><span class="p">(</span><span class="n">YZ_ot</span><span class="p">)]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w">

</span><span class="c1"># result</span><span class="w">
</span><span class="nf">round</span><span class="p">(</span><span class="n">addmargins</span><span class="p">(</span><span class="n">YZ_ot</span><span class="p">),</span><span class="m">3</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;       (0,15]  (15,30]  (30,45]  (45,60]  (60,75]  (75,90] (90,100]       Sum
#&gt; 1    908.206  732.717  739.153  768.961  886.744  804.436  505.966  5346.183
#&gt; 2    229.633  157.397  125.376  231.835  232.178  166.663  105.011  1248.094
#&gt; 3    111.164  105.834   47.015   68.783   81.041   51.124   51.106   516.067
#&gt; 4     60.987   66.797  104.289  210.126   76.667   92.290   12.085   623.241
#&gt; 5    549.988  566.912  482.875  446.948  400.627  362.297  356.626  3166.273
#&gt; 6      8.577   37.881   14.943   51.063    0.000   10.138    0.000   122.602
#&gt; 7    176.177  164.798  193.780  190.152  119.938  166.566   73.129  1084.540
#&gt; Sum 2044.732 1832.336 1707.432 1967.869 1797.195 1653.514 1103.922 12107.000</code></pre></figure>

<h2 id="balanced-sampling">Balanced sampling</h2>

<p>As you can see from the previous section, the optimal transport results generally do not have a one-to-one match, meaning that for every unit in sample 1, we have more than one unit with weights not equal to 0 in sample 2.  The <code class="language-plaintext highlighter-rouge">bsmatch</code> function creates a one-to-one match by selecting a balanced stratified sampling to obtain a data.frame where each unit in sample 1 has only one imputed unit from sample 2.</p>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Balanced Sampling </span><span class="w">
</span><span class="n">BS</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">bsmatch</span><span class="p">(</span><span class="n">object</span><span class="p">)</span><span class="w">
</span><span class="n">head</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="p">[,</span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">])</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;         id1    id2    weight
#&gt; 702     702 251702 11.509002
#&gt; 1      1401   1401 13.550397
#&gt; 2506   2506 194004  8.315938
#&gt; 3001   3001 494702 10.976034
#&gt; 3602   3602 503002 12.651816
#&gt; 3901.1 3901 294601 10.970414</code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="n">Y1_bs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">X1</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">id1</span><span class="p">),],</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Y1</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">id1</span><span class="p">),])</span><span class="w">
</span><span class="n">Z2_bs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">X2</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">id2</span><span class="p">),],</span><span class="n">z</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Z2</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">id2</span><span class="p">),])</span><span class="w">
</span><span class="n">YZ_bs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tapply</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">weight</span><span class="o">/</span><span class="n">BS</span><span class="o">$</span><span class="n">q</span><span class="p">,</span><span class="nf">list</span><span class="p">(</span><span class="n">Y1_bs</span><span class="o">$</span><span class="n">y</span><span class="p">,</span><span class="n">Z2_bs</span><span class="o">$</span><span class="n">z</span><span class="p">),</span><span class="n">sum</span><span class="p">)</span><span class="w">
</span><span class="n">YZ_bs</span><span class="p">[</span><span class="nf">is.na</span><span class="p">(</span><span class="n">YZ_bs</span><span class="p">)]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="nf">round</span><span class="p">(</span><span class="n">addmargins</span><span class="p">(</span><span class="n">YZ_bs</span><span class="p">),</span><span class="m">3</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;       (0,15]  (15,30]  (30,45]  (45,60]  (60,75]  (75,90] (90,100]       Sum
#&gt; 1    950.180  747.323  753.780  706.298  903.800  786.384  498.417  5346.183
#&gt; 2    202.833  138.314  153.513  220.030  237.620  175.001  120.782  1248.094
#&gt; 3     93.911   92.113   52.996   78.145   85.264   42.861   70.776   516.067
#&gt; 4     69.117   58.966  102.611  227.049   68.017   85.771   11.710   623.241
#&gt; 5    516.973  554.095  495.790  464.549  395.341  331.096  408.429  3166.273
#&gt; 6      8.367   37.881   14.943   51.274    0.000   10.138    0.000   122.602
#&gt; 7    171.059  177.551  181.066  213.189  102.669  172.524   66.482  1084.540
#&gt; Sum 2012.440 1806.244 1754.699 1960.535 1792.710 1603.775 1176.597 12107.000</code></pre></figure>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># With Z2 as auxiliary information for stratified balanced sampling.</span><span class="w">
</span><span class="n">BS</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">bsmatch</span><span class="p">(</span><span class="n">object</span><span class="p">,</span><span class="n">Z2</span><span class="p">)</span><span class="w">

</span><span class="n">Y1_bs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">X1</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">id1</span><span class="p">),],</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Y1</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">id1</span><span class="p">),])</span><span class="w">
</span><span class="n">Z2_bs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cbind</span><span class="p">(</span><span class="n">X2</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">id2</span><span class="p">),],</span><span class="n">z</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Z2</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">id2</span><span class="p">),])</span><span class="w">
</span><span class="n">YZ_bs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tapply</span><span class="p">(</span><span class="n">BS</span><span class="o">$</span><span class="n">object</span><span class="o">$</span><span class="n">weight</span><span class="o">/</span><span class="n">BS</span><span class="o">$</span><span class="n">q</span><span class="p">,</span><span class="nf">list</span><span class="p">(</span><span class="n">Y1_bs</span><span class="o">$</span><span class="n">y</span><span class="p">,</span><span class="n">Z2_bs</span><span class="o">$</span><span class="n">z</span><span class="p">),</span><span class="n">sum</span><span class="p">)</span><span class="w">
</span><span class="n">YZ_bs</span><span class="p">[</span><span class="nf">is.na</span><span class="p">(</span><span class="n">YZ_bs</span><span class="p">)]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">0</span><span class="w">
</span><span class="nf">round</span><span class="p">(</span><span class="n">addmargins</span><span class="p">(</span><span class="n">YZ_bs</span><span class="p">),</span><span class="m">3</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;       (0,15]  (15,30]  (30,45]  (45,60]  (60,75]  (75,90] (90,100]       Sum
#&gt; 1    916.607  733.295  727.348  783.917  893.807  804.205  487.003  5346.183
#&gt; 2    215.298  139.840  115.908  246.175  238.891  195.369   96.613  1248.094
#&gt; 3    105.798  103.158   75.489   55.427   72.337   42.861   60.997   516.067
#&gt; 4     46.193   70.368  114.037  190.878   91.443   98.613   11.710   623.241
#&gt; 5    571.876  569.674  459.916  460.539  378.955  356.515  368.799  3166.273
#&gt; 6      8.367   37.881   14.943   51.274    0.000   10.138    0.000   122.602
#&gt; 7    186.803  174.473  191.122  180.442  130.024  144.693   76.984  1084.540
#&gt; Sum 2050.942 1828.688 1698.763 1968.652 1805.457 1652.393 1102.105 12107.000</code></pre></figure>

<h2 id="prediction">Prediction</h2>

<figure class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># split the weight by id1</span><span class="w">
</span><span class="n">q_l</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">split</span><span class="p">(</span><span class="n">object</span><span class="o">$</span><span class="n">weight</span><span class="p">,</span><span class="n">f</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">object</span><span class="o">$</span><span class="n">id1</span><span class="p">)</span><span class="w">
</span><span class="c1"># normalize in each id1</span><span class="w">
</span><span class="n">q_l</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">lapply</span><span class="p">(</span><span class="n">q_l</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">){</span><span class="n">x</span><span class="o">/</span><span class="nf">sum</span><span class="p">(</span><span class="n">x</span><span class="p">)})</span><span class="w">
</span><span class="n">q</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">do.call</span><span class="p">(</span><span class="n">c</span><span class="p">,</span><span class="n">q_l</span><span class="p">))</span><span class="w">
  
</span><span class="n">Z_pred</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">t</span><span class="p">(</span><span class="n">q</span><span class="o">*</span><span class="n">disjunctive</span><span class="p">(</span><span class="n">object</span><span class="o">$</span><span class="n">id1</span><span class="p">))</span><span class="o">%*%</span><span class="n">disjunctive</span><span class="p">(</span><span class="n">Z2</span><span class="p">[</span><span class="nf">as.character</span><span class="p">(</span><span class="n">object</span><span class="o">$</span><span class="n">id2</span><span class="p">),])</span><span class="w">
</span><span class="n">colnames</span><span class="p">(</span><span class="n">Z_pred</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">levels</span><span class="p">(</span><span class="n">factor</span><span class="p">(</span><span class="n">Z2</span><span class="o">$</span><span class="n">c_income</span><span class="p">))</span><span class="w">
</span><span class="n">head</span><span class="p">(</span><span class="n">Z_pred</span><span class="p">)</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-text" data-lang="text">#&gt;         (0,15]   (15,30]    (30,45] (45,60] (60,75] (75,90] (90,100]
#&gt; [1,] 0.0000000 0.0000000 1.00000000       0       0       0        0
#&gt; [2,] 0.0000000 0.0000000 0.00000000       1       0       0        0
#&gt; [3,] 0.1949201 0.8050799 0.00000000       0       0       0        0
#&gt; [4,] 0.0000000 0.0000000 0.00000000       0       1       0        0
#&gt; [5,] 0.0000000 0.0000000 1.00000000       0       0       0        0
#&gt; [6,] 0.7749145 0.1455486 0.07953691       0       0       0        0</code></pre></figure>]]></content><author><name>Raphaël Jauslin</name></author><summary type="html"><![CDATA[Example of statistical matching using optimal transport.]]></summary></entry></feed>