<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>kforner</title>
<link>https://kforner.netlify.app/</link>
<atom:link href="https://kforner.netlify.app/index.xml" rel="self" type="application/rss+xml"/>
<description>Karl Forner&#39;s blog</description>
<generator>quarto-1.4.554</generator>
<lastBuildDate>Mon, 20 Oct 2025 22:00:00 GMT</lastBuildDate>
<item>
  <title>Rfuzzycoco released on CRAN</title>
  <dc:creator>Karl Forner</dc:creator>
  <link>https://kforner.netlify.app/posts/rfuzzycoco_on_cran/</link>
  <description><![CDATA[ 





<p>My <strong>Rfuzzycoco</strong> package just hit the CRAN: https://cran.r-project.org/web/packages/Rfuzzycoco/index.html !! Publishing to CRAN is a rigorous process, and it was particularly challenging as this package includes custom C++ code. I documented the preparation process, including the steps needed for C++ integration, in a previous post: <a href="../preparing_rfuzzycoco_for_cran/">Preparing Rfuzzycoco for publication on CRAN</a></p>
<p><strong>The Fuzzy CoCo Algorithm</strong></p>
<p>The core algorithm, <em>Fuzzy CoCo: a cooperative-coevolutionary approach to fuzzy modeling</em> ingeniously combines <strong>fuzzy logic</strong> with cooperative <strong>genetic algorithms</strong> to evolve clear, human-understandable models, making it a powerful tool for explainable machine learning (XAI).</p>
<p><strong>The C++ Foundation</strong> To make <strong>Rfuzzycoco</strong> possible, I first had to reimplement from scratch the main legacy <strong>Fuzzy CoCo</strong> implementation, which I released as the <strong>fuzzycoco</strong> C++ library, available here: <a href="https://github.com/Lonza-RND-Data-Science/fuzzycoco">https://github.com/Lonza-RND-Data-Science/fuzzycoco</a>. You can read more details about this in my post: <a href="../fuzzycoco_release/">fuzzycoco: C++ open-source release of my re-implementation of the Fuzzy Coco algorithm</a>.</p>
<p><strong>Get started</strong> If you are interested in predicting or classifying your data with simple, human understandable, stable rules, give <strong>Rfuzzycoco</strong> a try, or reach out to me. I’m also open to collaborations, as there are many exciting opportunities to enhance both the implementation and the algorithm itself.</p>
<p><em><a href="../../about">I (Karl Forner)</a> am currently working as a consultant, contact me if you want me to help you with using R, organizing development, developing R packages or more generally supporting your software development efforts.</em></p>



 ]]></description>
  <category>R</category>
  <category>fuzzycoco</category>
  <category>c++</category>
  <guid>https://kforner.netlify.app/posts/rfuzzycoco_on_cran/</guid>
  <pubDate>Mon, 20 Oct 2025 22:00:00 GMT</pubDate>
  <media:content url="https://kforner.netlify.app/posts/rfuzzycoco_on_cran/fuzzycoco_logo.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>clitable: a new R package to easily print pretty tables in the terminal</title>
  <dc:creator>Karl Forner</dc:creator>
  <link>https://kforner.netlify.app/posts/clitable_release/</link>
  <description><![CDATA[ 





<p>I am pleased to announce that <a href="https://github.com/kforner/clitable">clitable</a>, my new R package for printing tables in the terminal, just made its way to the <a href="https://cran.r-project.org/web/packages/clitable/index.html">CRAN</a>!! It lets you print pretty colorful tables directly in the R console or terminal.</p>
<p>I am also quite proud that it was accepted on its first submission, without anything to fix, which is in my experience not that common, and is also the subject of my previous post <a href="../preparing_rfuzzycoco_for_cran/index.html">Preparing Rfuzzycoco for publication on CRAN</a>. The average time for a CRAN submission review is a week, so imagine the delay if you have to resubmit 4 or 5 times. This was made possible by a combination of tools and good practices (and experience): devtools, testthat, roxygen2, covr, git, github actions (CI), rhub, pkgdown, codecov…</p>
<p>And once again, my package has <strong>100% test coverage</strong> <img src="https://kforner.netlify.app/posts/clitable_release/coverage.svg" class="img-fluid" alt="coverage badge"> (and yes it’s overkill but I am a Test addict…).</p>
<p>So what is it for ? For pretty printing tables (data frames, matrices) in the terminal, like this: <img src="https://kforner.netlify.app/posts/clitable_release/clitable.png" class="img-fluid" alt="a clitable example"></p>
<p>In this example, you can see:</p>
<ul>
<li>column headers, rendered in <strong>bold</strong></li>
<li>columns with adequate size</li>
<li>2 highlighted rows (in green)</li>
<li>NA values rendered with a custom style, <strong>strikethrough</strong></li>
<li>the <strong>flipper_len</strong> column with a <strong>heatmap</strong> background</li>
<li>the first value, in cell (1, 1) using a custom <a href="https://r-lib.github.io/crayon/">crayon</a> style</li>
</ul>
<p>The corresponding code is:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># the table to print</span></span>
<span id="cb1-2">  df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">head</span>(datasets<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>penguins, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span>
<span id="cb1-3"></span>
<span id="cb1-4">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># custom crayon style </span></span>
<span id="cb1-5">  df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>species <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>species)</span>
<span id="cb1-6">  df[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> crayon<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">style</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ADELIE"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"underline"</span>,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bgYellow"</span>)</span>
<span id="cb1-7"></span>
<span id="cb1-8">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># create clitable</span></span>
<span id="cb1-9">  ct <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cli_table</span>(df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">header_style =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bold"</span>,</span>
<span id="cb1-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">NA_style =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"strikethrough"</span>,</span>
<span id="cb1-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">heatmap_columns =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"flipper_len"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xmin =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">180</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xmax =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>,</span>
<span id="cb1-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hilite_rows =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sex) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sex <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"female"</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>bill_dep <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">19</span>, </span>
<span id="cb1-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hilite_style =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bgGreen"</span></span>
<span id="cb1-14">  )</span>
<span id="cb1-15"></span>
<span id="cb1-16">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># print it</span></span>
<span id="cb1-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cat</span>(ct, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb1-18"><span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div>
</div>
<p>Note that this is my most complex example.</p>
<p>You may ask, <strong>why would I want to print a table in the terminal??</strong> In R we have multiple ways to render pretty tables in html, pdf, images.</p>
<p>My personal use case if for the next version of a package of mine, <a href="https://kforner.github.io/srcpkgs/">srcpkgs</a>, which can render the results of testing or checking a collection of R source packages as multiple tables in the terminal. I need to <strong>highlight</strong> the rows that correspond to errors and to print the <strong>elapsed time</strong> column as a heatmap to quickly identify bottlenecks. I also wanted to make it compatible with the amazing <a href="https://r-lib.github.io/crayon/">crayon</a> and <a href="https://r-lib.github.io/cli/">cli</a> packages, so that it is easy to customize the output.</p>
<p>Another use case I foresee is for logging some results. The <a href="https://r-lib.github.io/cli/">cli</a> package is really useful for that, but seems to be missing an easy way to print tables, hence clitable!</p>
<p>You may also ask, <strong>aren’t there other R packages that can render pretty tables as text??</strong> There are indeed. For example I have used a lot the <a href="https://hughjonesd.github.io/huxtable/">huxtable</a> package, that is really powerful, and can render tables in LaTeX, HTML, text. But using it for my purpose is not that straightforward, to the extent of my knowledge it is not compatible with <a href="https://r-lib.github.io/cli/">cli</a>, and it has a lot of dependencies. In comparison, <a href="https://github.com/kforner/clitable">clitable</a> is a dwarf, but has minimal dependencies (<code>crayon</code>/<code>cli</code>), is easy to use and easy to extend, and intended to work well with cli. Note that this is the very first version, and the interface can evolve if needed.</p>
<p>Feedback and contributions are welcome on GitHub.</p>
<p>Here are some more examples, part of <code>clitable::demo()</code></p>
<p><img src="https://kforner.netlify.app/posts/clitable_release/styles1.png" class="img-fluid" alt="some styles"> <img src="https://kforner.netlify.app/posts/clitable_release/styles2.png" class="img-fluid" alt="more styles"> <img src="https://kforner.netlify.app/posts/clitable_release/heatmaps.png" class="img-fluid" alt="heatmaps"> <img src="https://kforner.netlify.app/posts/clitable_release/hilite.png" class="img-fluid" alt="hilite"></p>



 ]]></description>
  <category>R</category>
  <category>dev</category>
  <guid>https://kforner.netlify.app/posts/clitable_release/</guid>
  <pubDate>Wed, 15 Oct 2025 22:00:00 GMT</pubDate>
  <media:content url="https://kforner.netlify.app/posts/clitable_release/clitable.png" medium="image" type="image/png" height="101" width="144"/>
</item>
<item>
  <title>Preparing Rfuzzycoco for publication on CRAN</title>
  <dc:creator>Karl Forner</dc:creator>
  <link>https://kforner.netlify.app/posts/preparing_rfuzzycoco_for_cran/</link>
  <description><![CDATA[ 





<p>I recently released on github a R package <a href="https://github.com/Lonza-RND-Data-Science/Rfuzzycoco">Rfuzzycoco</a> that provides the <strong>Fuzzy Coco</strong> algorithm by wrapping my <a href="https://github.com/Lonza-RND-Data-Science/fuzzycoco">fuzzycoco</a> C++ implementation and extending it. It provides easy installation and access to this software.</p>
<p>The Comprehensive R Archive Network (<strong>CRAN</strong>) is R’s main package repository. The quality of CRAN packages is enforced by a very drastic process of submission, that covers the code itself, the dependencies, the size of the package, the portability of file encoding and filenames, the documentation, the description of the package, the code examples etc…</p>
<p>Having a package accepted can be a daunting and very time-consuming task, so that some developers just give up and release their package by other means, for example on <strong>github</strong>.</p>
<p>It is even much worse for packages with C++ code, because the package has to implement the build process in a portable way, and the package should work on the 3 major platforms: Linux, MacOs and Windows, that use different compilers and implementations of the C++ standard library.</p>
<p>On the other hand, having his package on CRAN is a guarantee of quality and portability. There are also some useful services for the users, as the distribution of binary packages, or Debian/ubuntu APT packages. For developers, when you submit a new version there are automated checks against all reverse dependencies, i.e.&nbsp;all packages using your package, for regression testing.</p>
<p>I will briefly explain how I am preparing for submitting <a href="https://github.com/Lonza-RND-Data-Science/Rfuzzycoco">Rfuzzycoco</a> to the CRAN, the ecosystem and tools that I use. Some are very common and straightforward.</p>
<ul>
<li>I use the wonderful <a href="https://devtools.r-lib.org/">devtools</a> package to develop and test the package code.</li>
<li>documentation:
<ul>
<li>reference manual: I use <a href="https://roxygen2.r-lib.org/">roxygen2</a> to generate the function-level documentation from inline annotations in the source code. It is integrated in <a href="https://devtools.r-lib.org/">devtools</a>.</li>
<li>vignettes: I use <a href="https://rmarkdown.rstudio.com/">rmarkdown</a>.</li>
<li>website: I use <a href="https://pkgdown.r-lib.org/">pkgdown</a> to generate the HTML documentation from the roxygen doc and Rmarkdown vignettes and publish it on <a href="https://docs.github.com/en/pages">github pages</a> via the CI</li>
</ul></li>
<li>unit testing:
<ul>
<li>this is in my opinion <strong>the most fundamental aspect of development</strong>, assessing the quality of code and enabling the refactoring.</li>
<li>very surprisingly, tests are not mandatory for CRAN, but they are for me.</li>
<li>I use the <a href="https://testthat.r-lib.org/">testthat</a> package, also integrated in <a href="https://devtools.r-lib.org/">devtools</a></li>
<li>measuring the <strong>test coverage</strong> is also of paramount importance. I use <a href="https://covr.r-lib.org/">covr</a> for that, it is able to also cover the C/C++ code included in the R package.</li>
<li>I use the <a href="https://app.codecov.io/gh/Lonza-RND-Data-Science/Rfuzzycoco">codecov service</a> to publish the test coverage results.</li>
<li><img src="https://kforner.netlify.app/posts/preparing_rfuzzycoco_for_cran/coverage.svg" class="img-fluid" alt="coverage badge">: I just achieved <strong>100% test coverage</strong> , for both the R and C++ Rfuzzycoco code (excluding the fuzzycoco lib code which by the way has also 100% test coverage). I explained in a precedent post that in general it’s not worth trying to reach 100%, but it is for me.</li>
</ul></li>
<li>R CMD check:
<ul>
<li>this is a fundamental tool that implements lots of checks on your package, and also run the tests in a realistic way. You should use it from the beginning. I integrate it in my Makefile (<code>make check</code>) and automate it in the CI.</li>
</ul></li>
<li>git: of course your code must be versioned, and should use branches for developing new features.</li>
<li>github (or equivalent devops platform). I will only discuss github here since that’s what I’m using for Rfuzzycoco
<ul>
<li>it solves the distribution, the collaboration via forks and pull requests</li>
<li>it provides issues for reporting bugs, and interacting with other developers and users</li>
<li>it also provides documentation, via the README.md and github pages</li>
<li><strong>Continuous Integration</strong> (CI) via <a href="https://github.com/features/actions">github actions</a>
<ul>
<li>this also a fundamental feature. It can automate the checks, the documentation publishing and much more.</li>
<li>It can check your package on <strong>multiple platforms</strong></li>
<li>I currently have a CI for checking (R CMD check) on ubuntu, macos and windows, and on several versions of R (release and devel). This is an absolute <strong>killer feature</strong>, especially for CRAN since it can test the portability of your package.</li>
<li>I also have a CI that measures the test coverage, and automatically publish it on <a href="https://app.codecov.io/gh/Lonza-RND-Data-Science/Rfuzzycoco">codecov</a></li>
<li>and a CI to publish the HTML documentation on <a href="https://docs.github.com/en/pages">github pages</a></li>
</ul></li>
</ul></li>
</ul>
<p>The sooner this ecosystem is setup, the better. It for sure involves some work, but you can reuse all this infrastructure for other packages.</p>
<p>And I think one thing that is lacking is a standard R package project that would implement all this kind of tooling in a standardized, optimized and well maintained way. That would lower the barrier to entry to R package development and would dramatically increase the overall quality.</p>
<p>Stay tuned for more on the Rfuzzycoco CRAN journey.</p>
<p><em><a href="../../about">I (Karl Forner)</a> am currently working as a consultant, contact me if you want me to help you on using R, organizing development, developing R packages or more generally supporting your software development efforts.</em></p>



 ]]></description>
  <category>R</category>
  <category>fuzzycoco</category>
  <category>c++</category>
  <guid>https://kforner.netlify.app/posts/preparing_rfuzzycoco_for_cran/</guid>
  <pubDate>Tue, 30 Sep 2025 22:00:00 GMT</pubDate>
  <media:content url="https://kforner.netlify.app/posts/preparing_rfuzzycoco_for_cran/coverage.svg" medium="image" type="image/svg+xml"/>
</item>
<item>
  <title>fuzzycoco: C++ open-source release of my re-implementation of the Fuzzy Coco algorithm</title>
  <dc:creator>Karl Forner</dc:creator>
  <link>https://kforner.netlify.app/posts/fuzzycoco_release/</link>
  <description><![CDATA[ 





<p>I am pleased to announce the open-source (GPL-3) release of my re-implementation of the <strong>Fuzzy Coco</strong> algorithm: <a href="https://github.com/Lonza-RND-Data-Science/fuzzycoco">https://github.com/Lonza-RND-Data-Science/fuzzycoco</a>.</p>
<p>In short, <strong>Fuzzy CoCo</strong> combines <em>fuzzy logic</em> with <em>cooperative genetic algorithms</em> to evolve clear, human-understandable models for explainable machine learning, cf <em>Fuzzy CoCo: a cooperative-coevolutionary approach to fuzzy modeling</em> from <a href="https://orcid.org/0000-0002-2113-6498">Carlos Andrés Peña-Reyes</a>.</p>
<p>This is my re-implementation of the FUGE_LC C++ software, developed by Jean-Philippe Meylan, Yvan Da Silva and Rochus Keller (cf <a href="https://github.com/Lonza-RND-Data-Science/fuzzycoco/blob/main/README.md#acknowledgements">full acknowledgements</a>).</p>
<p>The motivations for that re-implementation were mainly to be able to easily use and distribute this software using high-level dynamic languages such as <strong>R</strong> and <strong>Python</strong>.</p>
<p>Some of the reasons <strong>FUGE_LC</strong>, the original implementation, was not suitable for that:</p>
<ul>
<li>it uses (and old version of) the <strong>Qt</strong> C++ framework, which, even though Qt is now open-source, makes it quite difficult to bundle with a R or Python package. And it was quite difficult to setup and build.</li>
<li>it can only be used via a <strong>javascript script</strong> interpreted internally, that makes it really difficult to use properly from another language, especially to control the iterations of the algorithm.</li>
<li>we wanted to add new features.</li>
</ul>
<p>The characteristics of this re-implementation are:</p>
<ul>
<li>everything had to be rewritten, since the existing code made heavy use of Qt data structure and base classes, and was not designed for unit-testing. But all the algorithms and calculations are the same.</li>
<li>it uses <strong>standard C++17 and its standard library</strong> with not a single external dependency, making it easy to bundle with for instance a R package.</li>
<li>it includes 2 related new features, prototyped by Magali Egger: <strong>features importance</strong> and <strong>genetic population biased initialization</strong>, based on those features importance. That should also be a future post.</li>
<li>it is available as a C++ shared and static <strong>library</strong>, but still provides a C++ executable, as FUGE_LC.</li>
<li>it is extremely well tested: <strong>100% test code coverage</strong> (<a href="https://codecov.io/github/Lonza-RND-Data-Science/fuzzycoco"><img src="https://codecov.io/github/Lonza-RND-Data-Science/fuzzycoco/graph/badge.svg?token=UMCPQXVXQA" class="img-fluid" alt="codecov"></a>). It is commonly assumed that a test coverage over 90% or 95% is overkill. For sure the last percents are by far the hardest to fix, but I am such a TDD (Test Driven Development) fanatic that I did. I always learn some new insights in programming and code design in that exercise. I’ll probably write a post about unit testing and the related tools if there is some interest.</li>
<li>the test-driven design makes it really <strong>modular</strong>, so that new features can be added more easily.</li>
<li>since the goal is to publish a R package, the code has to be <strong>portable</strong>, at least on the 3 main operating systems that the CRAN supports: Linux, MacOs and Windows. With C++ it’s really difficult, since each OS and compiler has its peculiarities. Using the github CI (Continuous Integration), named <strong>github actions</strong>, the code is automatically tested on all 3 platforms.</li>
<li>the software is of course <strong>reproducible</strong>, meaning that with the same input (including the random seed), we get the same output. Actually I spotted that it was not true for <strong>FUGE_LC</strong>, and Magali Egger fixed that.</li>
<li>it is also <strong>cross-platform reproducible</strong>. I mean the same input (including the random seed) will get the very same output on all 3 supported plaforms, and I actually had a hard-time achieving that. I’ll also probably write a post about that.</li>
<li>the current code has not been optimized for speed (yet), but for correctness and compatibility. But some obvious inefficiencies have been fixed. There is for sure plenty of room for optimization.</li>
</ul>
<p>I am currently working on the R package called <strong>Rfuzzycoco</strong>. It is already working but I am preparing for the <strong>CRAN</strong> submission.</p>
<p>Let me know if you are interested by this project.</p>



 ]]></description>
  <category>fuzzycoco</category>
  <category>c++</category>
  <guid>https://kforner.netlify.app/posts/fuzzycoco_release/</guid>
  <pubDate>Tue, 09 Sep 2025 22:00:00 GMT</pubDate>
  <media:content url="https://kforner.netlify.app/posts/fuzzycoco_release/fuzzycoco_illustration.png" medium="image" type="image/png" height="144" width="144"/>
</item>
<item>
  <title>Organizing R development using srcpkgs</title>
  <dc:creator>Karl Forner</dc:creator>
  <link>https://kforner.netlify.app/posts/organizing_dev_with_srcpkgs/</link>
  <description><![CDATA[ 





<section id="overview" class="level2">
<h2 class="anchored" data-anchor-id="overview">Overview</h2>
<p>This is an introduction on organizing R projects using source packages (powered by my R package <a href="https://kforner.github.io/srcpkgs/">srcpkgs</a>). It is based on notes for a talk I have on 2024-05-27 for the <a href="https://www.sib.swiss/vital-it">Swiss Institute of Bioinformatics Vital-IT group</a> Analysts meeting.</p>
<p>The objective is to organize R projects in order to:</p>
<ul>
<li>reuse code</li>
<li>share code</li>
<li>increase robustness</li>
<li>enable analysis (code) reproducibility</li>
</ul>
<p>The context is mostly for analysis oriented R projects.</p>
<section id="r-packages" class="level3">
<h3 class="anchored" data-anchor-id="r-packages">R packages</h3>
<p>All R users use R packages, the core ones such as base, stats, tools, and some from CRAN or BioConductor.</p>
<p>Why would you want to use R packages for your own code???</p>
<p>a R package is:</p>
<ul>
<li>self-contained
<ul>
<li>it bundles together all related code, the documentation, the relevant data and tests</li>
</ul></li>
<li>the dependencies are explicitly stated, and are themselves R packages</li>
</ul>
</section>
</section>
<section id="on-the-natural-evolution-of-code-projects" class="level2">
<h2 class="anchored" data-anchor-id="on-the-natural-evolution-of-code-projects">On the natural evolution of code projects…</h2>
<p>My view on the general evolution of analysis projects:</p>
<ul>
<li><p>you start with a <strong>single script</strong>, sequential, with no functions</p></li>
<li><p>at one point (after writing hundreds or thousands of lines) you realize that you need some <strong>functions</strong></p></li>
<li><p>then you start reusing those functions across projects by copy/paste. This raises a number of problems</p>
<ul>
<li>versioning: at one point you will fix or improve such a function
<ul>
<li>it may be difficult to remember which project contains the latest version</li>
<li>what of the projects that contain the incorrect versions?</li>
</ul></li>
</ul></li>
<li><p>then you may want, if you work in a team, to share this code with colleagues, or to use theirs</p>
<ul>
<li>–&gt; it requires some <strong>documentation</strong>, even terse.</li>
<li>there’s a increased <strong>responsibility</strong>. What if your code is wrong and impact the projects of your colleagues? One remedy is to write tests for those functions.</li>
<li>those functions are seldom independent, so that you can not just pick one</li>
<li>all those functions are <em>exposed</em> (i.e <em>public</em> or <em>exported</em>).
<ul>
<li>if you start to use a low-level function in your project, and that in the next version it has been refactored and that this function has been changed, or removed, updating the shared code will break the project.</li>
</ul></li>
</ul></li>
<li><p>for all those reasons you start packaging your reusable code as a <strong>R package</strong></p>
<ul>
<li>you can add documentation, tests, group code logically. It brings a namespace so that you can decide what you expose.</li>
</ul></li>
<li><p>But… it does NOT really solve the <strong>versioning</strong> problem</p>
<ul>
<li>in R, packages have to be <strong>installed</strong> (e.g.&nbsp;using <code>install.packages()</code>) before you can use them with <code>library(mypkg)</code></li>
<li>packages have a version number (N.B: this is not the same as <em>code versioning</em>)</li>
<li>if you use version v1 in your project A, and version v2 in project B, you have to juggle with versions (install/uninstall) Of course there are some tools to deal with that (renv…) but they work with external packages (or you need some private custom repositories)</li>
<li>and it’s very cumbersome. Suppose that in your project A you find a bug in the (installed package). In order to fix it, you need to
<ul>
<li>fetch the source code of the package</li>
<li>try to reproduce your problem. Chances are that you need your project data, you have to reproduce your session</li>
<li>finally, if you manage to fix it. You have to publish it, install it.</li>
</ul></li>
</ul></li>
<li><p>my approach is to use what I call <strong>R source packages</strong></p>
<ul>
<li>they are normal R packages, but instead of installing them on your R system, you load them directly from source in your R session.</li>
<li>it was made possible by the infamous <strong>Hadley Wickham</strong>, and his <code>devtools::load_all()</code> function, that mimics the loading of an installed package</li>
<li>this greatly helps with all those problems:
<ul>
<li>you embed your source packages inside your project (as <em>git submodules</em>, we’ll that see later) this solves the versioning/reproducibility at your reusable code level: all your projects may use a different version</li>
</ul></li>
<li>if you need to fix a bug, or improve and augment your reusable code, it’s a simple as editing the code for your project. And using <code>srcpkgs</code>, you can even easily reload the code inside your existing R sessions, without losing any computed data.</li>
</ul></li>
<li><p>so far so good. Then for ease of maintenance/modularity, you start splitting your reusable code by category, and develop several R packages, e.g.&nbsp;one for some misc utilities, one for loading data from your database, one for some specific analysis…</p>
<ul>
<li>this is where <code>srcpkgs</code> become usefuls, since <code>devtools</code> was designed to manage a <strong>single R source package</strong>, not a collection/<strong>library</strong> of possibly inter-dependent packages.
<ul>
<li>additionally has a useful little hack that enables you to use the standard <code>library()</code> function to load your source packages. So that when you analysis is finalized, or deployed in <em>production</em>, with your packages installed in the standard way, your script will continue to worl without any change.</li>
</ul></li>
</ul></li>
<li><p>But this does not solve the <strong>reproducibility</strong> for the external packages</p>
<ul>
<li>your code and source library most certainly use external packages, and also depend on your R version (and thus on the <em>bioconductor</em> version)</li>
<li>it may also depend on your OS architecture (CPU…)</li>
<li>this is out of scope for that talk, but one solution for that is to use a virtualized development environment: a <strong>docker</strong> container (cf https://rocker-project.org/) that contains a fixed version of <strong>R</strong>, and of all the needed external packages.</li>
<li>now the challenge is to synchronize that docker container version with your source library version…</li>
<li>also cf <a href="https://code.visualstudio.com/docs/devcontainers/containers">devcontainers</a></li>
</ul></li>
</ul>
<section id="summary" class="level3">
<h3 class="anchored" data-anchor-id="summary">Summary</h3>
<p><code>script --&gt; script+functions --&gt; script + source files --&gt;  R package --&gt; R source package --&gt; R source library [ + R docker env]</code></p>
</section>
</section>
<section id="my-recommended-project-setup" class="level2">
<h2 class="anchored" data-anchor-id="my-recommended-project-setup">My recommended project setup</h2>
<ul>
<li>the source <strong>library</strong> of R packages
<ul>
<li>should be a <strong>single dedicated git repository</strong>
<ul>
<li>recommended since it’s easier to have consistent versions of interdependent packages</li>
<li>but each package could be in its own git repository if needed</li>
</ul></li>
<li>each package should contain <strong>tests</strong> (very important, even if it’s counter intuitive, but there is usually more value in the test suite than in the code itself, don’t get me started on that…)</li>
<li>for internal packages, especially for a public of developers I personally that the <strong>documentation</strong> is less important, for example that for a publicly released package.</li>
<li>you should use <strong>CI</strong> (Continuous Integration, like github actions or gitlab CI) to automatically run the automated tests each time you push to the repository.</li>
<li>also, reporting the test coverage is important</li>
</ul></li>
<li>the <strong>project code</strong>
<ul>
<li>MUST be versioned in a git repository (in github/gitlab…): one repository per project</li>
<li>should itself be a R (source) package
<ul>
<li>easier to add tests, documentation, vignettes</li>
</ul></li>
<li>but can be a single script or a set of source files</li>
<li>contain a given version (commit/tag/branch) of the source library as a <strong>git submodule</strong></li>
<li>should contain a <strong>vscode devcontainer</strong> to execute the project’s code (automatically usable via <strong>github codespaces</strong>)</li>
</ul></li>
<li>the project R code will then use the <code>srcpkgs</code> package, that will automatically <strong>discover</strong> the R packages contained in the project folder, and transparently load them using the <em>hacked</em> <code>library()</code> function as if they were installed packages.</li>
</ul>
</section>
<section id="resources" class="level1">
<h1>Resources</h1>
<ul>
<li>the github repository of <a href="https://github.com/kforner/srcpkgs"><code>srcpkgs</code></a></li>
<li>the <a href="https://kforner.github.io/srcpkgs/">online documentation</a>
<ul>
<li>notably this demo vignette: <a href="https://kforner.github.io/srcpkgs/articles/demo.html">why would you need srcpkgs?</a></li>
</ul></li>
<li>This post should be available on <a href="https://www.r-bloggers.com">R bloggers</a></li>
</ul>
<p><a href="../../about">I (Karl Forner)</a> am currently working as a consultant, contact me if you want me to help you on using R, organizing development, developing R packages or more generally support your software development efforts.</p>


</section>

 ]]></description>
  <category>R</category>
  <category>srcpkgs</category>
  <category>dev</category>
  <guid>https://kforner.netlify.app/posts/organizing_dev_with_srcpkgs/</guid>
  <pubDate>Thu, 17 Jul 2025 22:00:00 GMT</pubDate>
</item>
<item>
  <title>an elegant way to fix user IDs in docker containers using docker_userid_fixer</title>
  <dc:creator>Karl Forner</dc:creator>
  <link>https://kforner.netlify.app/posts/docker_userid_fixer_intro/</link>
  <description><![CDATA[ 





<section id="what-is-it-about" class="level2">
<h2 class="anchored" data-anchor-id="what-is-it-about">what is it about?</h2>
<p>It’s about a rather technical issue in using docker containers that interact with the docker host computer, generally related to using the host filesystem inside the container. That happens in particular in reproducible research context. I developed an opensource utility that helps tackling that issue.</p>
</section>
<section id="docker-containers-as-execution-environments" class="level2">
<h2 class="anchored" data-anchor-id="docker-containers-as-execution-environments">docker containers as execution environments</h2>
<p>The initial and main use case of a docker container: a <em>self-contained</em> application that only interacts with the host system with some network ports. Think of a web application: the docker container typically contains a web server and a web application, running for example on port 80 (inside the container). The container is then run on the host, by binding the container internal port 80 to a host port (e.g.&nbsp;8000). Then the only interaction between the containerized app and the host system is via this bound network port.</p>
<p>Containers as execution environments are completely different:</p>
<ul>
<li>instead of containerizing an application, it’s the <strong>application build system</strong> that is containerized.
<ul>
<li>it could a be a compiler, an IDE, a notebook engine, a Quarto publishing system…</li>
</ul></li>
<li>the goals are:
<ul>
<li>to have an <strong>standard</strong>, easy to install and share environment
<ul>
<li>imagine a complex build environment, with fixed versions of R, python and zillions of external packages. Installing everything with the right versions can be a very difficult and time-consuming task. By sharing a docker image containing everything already installed and pre-configured is a real time-saver.</li>
</ul></li>
<li>to have a <strong>reproducible</strong> environment
<ul>
<li>by using it, you are able to reproduce some analysis results, since you are using very same controlled environment</li>
<li>you can also easily reproduce bugs, which is the first step to fixing them</li>
</ul></li>
</ul></li>
</ul>
<p>But, in order to use those execution environments, those containers must have access to the host system, in particular to the host user filesystem.</p>
</section>
<section id="docker-containers-and-the-host-filesystem" class="level2">
<h2 class="anchored" data-anchor-id="docker-containers-and-the-host-filesystem">docker containers and the host filesystem</h2>
<p>Suppose you have containerized an IDE, e.g.&nbsp;Rstudio. Your Rstudio is installed and running inside the docker container, but it needs to read and edit files in your project folder.</p>
<p>For that you <strong>bind mount</strong> your project folder (in your host filesystem) using the docker run <code>--volume</code> option. Then your files are accessible from withing the docker container.</p>
<p>The challenge now are the file permissions. Suppose your host user has userid <strong>1001</strong>, and suppose that the user owning the Rsudio process in the container is either <strong>0</strong> (root), or <strong>1002</strong>.</p>
<p>If the container user is <strong>root</strong>, then it will have no issue in reading your files. But as soon as you edit some existing files, are produce new ones (e.g.&nbsp;pdf, html), these files will belong to root <strong>also on the host filesystem!</strong> Meaning that your local host user will not be able to use them, or delete them, since they belong to root.</p>
<p>Now if the container user id is <strong>1002</strong>, Rstudio may not be able to read your files, edit them or produce new files. Even if it can, by settings some very permissive permissions, your local host user may not be able to use them.</p>
<p>Of course one bruteforce way of solving that issue is to run with root both on the host computer and withing the docker container. This is not always possible and raise some obvious critical security concerns.</p>
</section>
<section id="solving-the-file-owner-issue-part-1-the-docker-run---user-option" class="level2">
<h2 class="anchored" data-anchor-id="solving-the-file-owner-issue-part-1-the-docker-run---user-option">solving the file owner issue part 1: the docker run <code>--user</code> option</h2>
<p>Because we can not know in advance what will be the host userid (here <strong>1001</strong>), we can not pre-configure the userid of the docker container user.</p>
<p><strong>docker run</strong> now provides a <code>--user</code> option that enables to create a <strong>pseudo</strong> user with some supplied userid at runtime. For example, <code>docker run --user 1001 ...</code> will create a docker container running with processes belonging to a user with userid <strong>1001</strong>.</p>
<p>So what are we still discussing this issue? Isn’t it solved?</p>
<p>Here some quirks about that dynamically created user:</p>
<ul>
<li>it is a pseudo user</li>
<li>it does not have a home directory (/home/xxx)</li>
<li>it does not appear in <code>/etc/passwd</code></li>
<li>it can not be preconfigured, e.g.&nbsp;with a bash profile, some env vars, application defaults etc…</li>
</ul>
<p>We can work-around these problems, but it can be tedious and frustrating. What we’d really like, is to pre-configure a docker container user, and be able to dynamically change his <strong>userid</strong> at <strong>runtime</strong>…</p>
</section>
<section id="solving-the-file-owner-issue-part-2-enter-docker_userid_fixer" class="level2">
<h2 class="anchored" data-anchor-id="solving-the-file-owner-issue-part-2-enter-docker_userid_fixer">solving the file owner issue part 2: enter <code>docker_userid_fixer</code></h2>
<p><a href="https://github.com/kforner/docker_userid_fixer">docker_userid_fixer</a> is an open source utility intended to be used as a <strong>docker entrypoint</strong> to fix the userid issue I just raised.</p>
<p>Let’s see how to use it: you set it as your docker <code>ENTRYPOINT</code>, specifying which user should be used and have his <em>userid</em> dynamically modified:</p>
<pre><code>ENTRYPOINT ["/usr/local/bin/docker_userid_fixer","user1"]</code></pre>
<p>Let’s be precise in our terms:</p>
<ul>
<li>the <strong>target</strong> user, is the user requested to docker_userid_fixer, here <strong>user1</strong></li>
<li>the <strong>requested</strong> user, is the user provisioned by <code>docker run</code>, i.e the user that (intially) owns the first process (PID 1)</li>
</ul>
<p>Then, at the container runtime creation, there are two options:</p>
<ul>
<li>either the <strong>requested</strong> userid (already) matches the <strong>target</strong> userid, then nothing has to be changed</li>
<li>or it does not. For example the <strong>requested</strong> userid is <strong>1001</strong>, and the <strong>target</strong> userid is <strong>100</strong>. Then, <code>docker_userid_fixer</code> will fix the userid of the <strong>target</strong> user <strong>user1</strong> from 1000 to 1001, directly in the container main process.</li>
</ul>
<p>So in practice this solves our issue:</p>
<ul>
<li>if you do not need to fix your container userid, just use docker run the usual way (without the <code>--user</code> option)</li>
<li>or you use <code>--user</code> option, then in addition of running your main process with a userid you requested, it will modify your pre-configured user to your requested userid, so that your container is running with your intended user and intended userid.</li>
</ul>
</section>
<section id="docker_userid_fixer-setup" class="level2">
<h2 class="anchored" data-anchor-id="docker_userid_fixer-setup">docker_userid_fixer setup</h2>
<p>You can find instructions about the setup <a href="https://github.com/kforner/docker_userid_fixer#setup">here</a>.</p>
<p>But it boils down to:</p>
<ul>
<li>build or download the tiny executable (17k)</li>
<li>copy it into your docker image</li>
<li>make it executable as setuid root</li>
<li>configure it as your entrypoint</li>
</ul>
</section>
<section id="the-gory-details" class="level2">
<h2 class="anchored" data-anchor-id="the-gory-details">the gory details</h2>
<p>I have put some short notes <a href="https://github.com/kforner/docker_userid_fixer#how-it-works">https://github.com/kforner/docker_userid_fixer#how-it-works</a> but I’ll try to rephrase.</p>
<p>The crux of the implementation is the <strong>setuid root</strong> of the <code>docker_userid_fixer</code> executable in the container. We need root permissions to change the userid, and this setuid enables that privileged execution only for the <code>docker_userid_fixer</code>program, and that for a very short time.</p>
<p>As soon as the userid has been modified if needed, <code>docker_userid_fixer</code> will switch the main process to the requested user (and userid!).</p>


</section>

 ]]></description>
  <category>docker</category>
  <category>reproducible_research</category>
  <category>devops</category>
  <guid>https://kforner.netlify.app/posts/docker_userid_fixer_intro/</guid>
  <pubDate>Tue, 13 Aug 2024 22:00:00 GMT</pubDate>
</item>
</channel>
</rss>
