In the past couple of weeks, Ximin Luo worked on making R generate reproducible output. This is now mostly complete, and we're waiting on feedback from upstream about our patch. In the meantime, there are a few packages that remain unreproducible, but the issue probably lies in those specific packages rather than the R toolchain. Perhaps you can help out with them, after reading this!


R packages compile into a .rdb database format that contains the package's definitions, plus a .rdx index file for easy lookup in the .rdb file. Usually, there is a main rdb with the package contents, plus another rdb that stores the help data. There is also a paths.rds (same format as .rdx) that contains some more stuff.

One can actually read these files by hand using Rscript, see the diffoscope code in rdata.py. If you run that, you can see that path.rds contains some obvious paths, but the other files contain less obvious stuff, and in fact these scripts give identical output for the .rdb files, even though they are bitwise different. To get to the bottom of this, we'll have to use the R debugger.

Attached to this post is a script that smooths this process. I ran this against Debian's R packages, but it probably also works with other distros' R packages - try it and see.

You run it like ./r-mini-repro-test.sh $pkgdir $builddir and it will output some hashes for you; make sure to install the build dependencies first. You should manually vary both $pkgdir and $builddir to introduce the build-path variations; $builddir can be an arbitrary string but $pkgdir should point to the actual R package's source directory, so I just copy that to two locations and point the script at each of them in turn.

Now, we can begin debugging. Before I did this, we had 478 unreproducible R packages so the biggest problem was likely with R itself. I downloaded the source code of both R and a small example package (r-cran-tensor), then figured out how the R packages were actually being built. This resulted in me writing the script above, to speed up debugging. You can read about R's debugger here (no HTTPS), it's a bit primitive but it supports the basic stuff (step, continue, where, print/eval) and that was adequate for me.

Debugging R core

So let's take a small example with Debian R 3.4.0-1:

$ ../r-mini-repro-test.sh r-cran-tensor-1.5 123
[..]
* installing *source* package ‘tensor’ ...
** package ‘tensor’ successfully unpacked and MD5 sums checked
** R
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (tensor)
[..]
c71a766429151a13e46039f3c37a14edc704004f410c34f8f37a2f098506421d  ./debian/123/usr/lib/R/site-library/tensor/R/tensor.rdx
736b5b4a885db71fb3fe2a538faeefe43c500c4080dbf2c75b97135b98401acd  ./debian/123/usr/lib/R/site-library/tensor/R/tensor.rdb
8a3f9616de7ee78a83b07395dd6ec43c5259d3f1b1b653d397808fc9fc5d96c2  ./debian/123/usr/lib/R/site-library/tensor/help/tensor.rdx
d1ac634987b24606a5da627ec073f45edce2bddf071391dd17c246a0a01eba63  ./debian/123/usr/lib/R/site-library/tensor/help/tensor.rdb
80796cc1a2cfcac40c394153f67617d575e103a405430b8483c2261a903db046  ./debian/123/usr/lib/R/site-library/tensor/help/paths.rds
$ ../r-mini-repro-test.sh r-cran-tensor-1.5 1234
[..]
5f081f4d6f9da1fbe06e374067e6636d926de4cdf09886058b15c5928c636c0d  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdx
7145669891e1b766ca0d8d89fa9c317af716e61a62650abe3a11a848240f1d57  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdb
8a3f9616de7ee78a83b07395dd6ec43c5259d3f1b1b653d397808fc9fc5d96c2  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdx
d1ac634987b24606a5da627ec073f45edce2bddf071391dd17c246a0a01eba63  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdb
80796cc1a2cfcac40c394153f67617d575e103a405430b8483c2261a903db046  ./debian/1234/usr/lib/R/site-library/tensor/help/paths.rds

The files are created by the ** preparing package for lazy loading and ** help steps. In retrospect this seems somewhat obvious from the output, but I had never done R before so I spent a while using strace to be sure; one can see that those files are indeed written to after those strings are printed.

Then we can grep the R source code for these strings.

r-base-3.4.0$ grep -Ri "preparing package for lazy loading" src
src/library/tools/R/install.R:                starsmsg(stars, "preparing package for lazy loading")

Reading the file, we see that this happens in the tools:::.install_packages function, and reading it further we see that it calls makeLazyLoading then code2LazyLoadDB then makeLazyLoadDB. Seems promising, let's confirm it before chasing potential wild geese.

Important: In the rest of these command-line outputs I'll prepend what I input with >>> but this doesn't actually get printed by Rscript. So don't be surprised that you don't see these, when you're trying to recreate my steps.

Also I am a total R noob so possibly there are more elegant ways to do what I'm about to show you; this is what I came up with after 2-3 hours of research.

$ DEBUG=1 ../r-mini-repro-test.sh r-cran-tensor-1.5 123
enter (some or all of) the following into R:
[.. helpful commands ..]
+ R_DEFAULT_PACKAGES= LC_COLLATE=C /usr/lib/R/bin/R --no-restore --slave --args nextArg-lnextArgdebian/123/usr/lib/R/site-librarynextArg-dnextArg.nextArg--built-timestamp="Thu, 01 Jan 1970 00:00:00 +0000"
>>> debug(tools:::makeLazyLoadDB)
>>> tools:::.install_packages()
processing ‘.’
[..]
** preparing package for lazy loading
debugging in: makeLazyLoadDB(ns, dbbase, compress = compress)
debug: {
[.. source code of makeLazyLoadDB, not yet run ..]
}

OK so we just entered the function and it's paused, in a separate shell let's check what it's doing:

$ find r-cran-tensor-1.5 -name '*.rdb'
[ no output ]

So, no files have been written yet, which means none of the parent functions did anything, we are in the right function (or it's in a child). If we continue the function, we get:

>>> c
exiting from: makeLazyLoadDB(ns, dbbase, compress = compress)
** help
debugging in: makeLazyLoadDB(db, file.path(manOutDir, basename(outDir)))
debug: {
[.. source code of makeLazyLoadDB, not yet run ..]
}

$ find r-cran-tensor-1.5 -name '*.rdb'
r-cran-tensor-1.5/debian/123/usr/lib/R/site-library/tensor/R/tensor.rdb

Then continuing again:

>>> c
exiting from: makeLazyLoadDB(db, file.path(manOutDir, basename(outDir)))
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (tensor)

$ find r-cran-tensor-1.5 -name '*.rdb'
r-cran-tensor-1.5/debian/123/usr/lib/R/site-library/tensor/help/tensor.rdb
r-cran-tensor-1.5/debian/123/usr/lib/R/site-library/tensor/R/tensor.rdb

So it seems likely this function is the correct one. After reading the source code of makeLazyLoadDB, even though I don't understand the whole thing, it looks like possibly this part is the one that does the actual writing:

    for (i in seq_along(vars)) {
        key <- if (is.null(from) || is.environment(from))
            lazyLoadDBinsertVariable(vars[i], from, datafile,
                                     ascii, compress,  envhook)
        else lazyLoadDBinsertListElement(from, i, datafile, ascii,
                                         compress, envhook)
        assign(vars[i], key, envir = varenv)
    }

So let's step through it and see if the data contains any paths.

$ DEBUG=1 ../r-mini-repro-test.sh r-cran-tensor-1.5 123
[..]
>>> debug(tools:::makeLazyLoadDB)
>>> tools:::.install_packages()
processing ‘.’
[..]
** preparing package for lazy loading
debugging in: makeLazyLoadDB(ns, dbbase, compress = compress)
debug: {
[.. source code of makeLazyLoadDB, not yet run ..]
}
>>>                                                                             # pressing <Enter> just means "step"
debug: ascii <- as.logical(ascii)                                               # this is the first line of the function
>>>                                                                             # keep stepping
[..]
>>>
debug: for (i in seq_along(vars)) {
[.. source code of the inside of the block, not yet run ..]
}
>>>
debug: key <- if (is.null(from) || is.environment(from)) lazyLoadDBinsertVariable(vars[i],
    from, datafile, ascii, compress, envhook) else lazyLoadDBinsertListElement(from,
    i, datafile, ascii, compress, envhook)
>>>
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)                                                                    # not yet run

If we look at the source code we see that it will try to insert the value of .Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]] into the file, so let's evaluate it:

>>> .Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]]
function (x, y)
tensor(x, y, 2, 2)
<environment: namespace:tensor>
>>> typeof(.Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]])
[1] "closure"
>>> environment(.Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]])        # get the environment of a closure
<environment: namespace:tensor>
>>> envlist(environment(.Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]])) # environment as an envlist, more detailed output
$`%*t%`
function (x, y)
tensor(x, y, 2, 2)
<environment: namespace:tensor>

$`%t*%`
function (x, y)
tensor(x, y, 1, 1)
<environment: namespace:tensor>

$`%t*t%`
function (x, y)
tensor(x, y, 1, 2)
<environment: namespace:tensor>

$.__NAMESPACE__.
<environment: 0x55eb55d00be0>

$.__S3MethodsTable__.
<environment: 0x55eb55d166e8>

$.packageName
[1] "tensor"

$tensor
function (A, B, alongA = integer(0), alongB = integer(0))
{
[.. function definition ..]
}
<environment: namespace:tensor>

Nothing obviously containing paths so far, let's keep stepping and dumping the objects

>>>
debug: assign(vars[i], key, envir = varenv)
>>>                                                                             # next iteration now
debug: key <- if (is.null(from) || is.environment(from)) lazyLoadDBinsertVariable(vars[i],
    from, datafile, ascii, compress, envhook) else lazyLoadDBinsertListElement(from,
    i, datafile, ascii, compress, envhook)
>>>
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)
>>> .Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]]
function (x, y)
tensor(x, y, 1, 1)
<environment: namespace:tensor>
>>>
[.. more stepping ..]
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)
>>> .Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]]
<environment: 0x55eb55d00be0>
>>> envlist(.Internal(getVarsFromFrame(vars[i], from, FALSE))[[1L]])
$S3methods
     [,1] [,2] [,3]

$dynlibs
NULL

$exports
<environment: 0x55eb55d0a4f8>

$imports
$imports$base
[1] TRUE


$lazydata
<environment: 0x55eb55d0a178>
attr(,"name")
[1] "lazydata:tensor"

$path
[1] "/mystupidbuildpath/r-tests/r-cran-tensor-1.5/debian/123/usr/lib/R/site-library/tensor"

$spec
    name  version
"tensor"    "1.5"

Oh, here we go. (In fact we could have got this earlier, it was part of the .__NAMESPACE__. key of the first environment that we examined.) Naturally, you'll sometimes miss stuff, and sometimes go down dead ends when doing something like this, perseverance is important.

The names are also a hint, after grepping the R source code for names like "S3methods" and "lazydata" we see that src/library/base/R/namespace.R is the only one that contains all of them, then reading that code in more depth reveals this.

After patching it, rebuilding r-base, installing this patched version, then running our tests again, we see that we've eliminated differences in the main rdb file but not the help rdb file. :(

In fact, after some more time stepping through the loop repeatedly and printing out all the objects, nothing obvious shows up. Time to re-read the source code of makeLazyLoadDB. We notice that all the lazyLoadDBinsert* functions have a hook parameter that we haven't been stepping through because it's in a separate function. So let's do that. Since it's an inner function, one has to call debug() inside makeLazyLoadDB:

$ DEBUG=1 ../r-mini-repro-test.sh r-cran-tensor-1.5 123
[..]
>>> debug(tools:::makeLazyLoadDB)
>>> tools:::.install_packages()
processing ‘.’
[..]
** preparing package for lazy loading
debugging in: makeLazyLoadDB(ns, dbbase, compress = compress)
debug: {
[.. source code of makeLazyLoadDB, not yet run ..]
}
>>> c
exiting from: makeLazyLoadDB(ns, dbbase, compress = compress)
** help
debugging in: makeLazyLoadDB(db, file.path(manOutDir, basename(outDir)))
debug: {
[.. source code of makeLazyLoadDB, not yet run ..]
}
[.. stepping through the function ..]
debug: envhook <- function(e) {
[.. function definition ..]
}
>>>                                                                             # step next, to define that function
[.. next statement, we don't care about it ..]
>>> debug(envhook)
[.. then keep stepping through makeLazyLoadDB ..]
>>>
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)
[.. then keep stepping through makeLazyLoadDB ..]
debug: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress,
    envhook)
[.. hook wasn't run for these previous iterations for some reason, but keep stepping ..]
[.. eventually we get ..]

debugging in: (function (e)
{
[.. function definition ..]
})(<environment>)                                                               # "(..)(<environment>)" is R's name for that function (actually a closure)
debug: {
[.. source code, not yet run ..]
}
>>> e
/mystupidbuildpath/r-tests/r-cran-tensor-1.5/man/tensor.Rd
>>> typeof(e)
[1] "environment"
>>> envlist(e)
$Enc
[1] "UTF-8"

$encoding
[1] ""

$filename
[1] "/mystupidbuildpath/r-tests/r-cran-tensor-1.5/man/tensor.Rd"

$timestamp
[1] "2002-01-16 11:51:14 CET"

$wd
[1] "/mystupidbuildpath/r-tests/r-cran-tensor-1.5"

After doing the same thing as before, i.e. grepping the R source code for "Enc" "timestamp" etc, we spot a few different places that are setting the wd field.

Then after some experimenting, we find that overwriting this in a different file (src/library/tools/R/parseRd.R) seems to work to get rid of this irreproducibility. (This may not be the best; our exact strategy is still being discussed with upstream, but at least we know which areas of the code are responsible.)

Exercise for the reader to figure out how to get rid of the paths.rds difference. After doing the previous two, this one is fairly easy.

These three issues allowed us to write a preliminary patch that enabled us to reproduce r-cran-tensor.

$ ./r-mini-repro-test.sh r-cran-tensor-1.5 123
[..]
bca9de1351ab265475cf6f1af3f6aff49bde1d209e8fafb33954caa8d90f3462  ./debian/123/usr/lib/R/site-library/tensor/R/tensor.rdx
ac7fffeb751ffc2d13c5317704b19787a7078446d97d1c6f259c93f282e04cef  ./debian/123/usr/lib/R/site-library/tensor/R/tensor.rdb
d989603340d1e8f14c35551ef379956653bf3b6e7b1868b4bf9a66d85820998e  ./debian/123/usr/lib/R/site-library/tensor/help/tensor.rdx
30e543cda9f97c202dc9c0291933940dfe71b94d352f66d8ed6c035c1a722b41  ./debian/123/usr/lib/R/site-library/tensor/help/tensor.rdb
e9f78ba2cca39a6face1ed88fce4bb718d911d567d01db49bc00a3011720ac50  ./debian/123/usr/lib/R/site-library/tensor/help/paths.rds
$ ./r-mini-repro-test.sh r-cran-tensor-1.5 1234
[..]
bca9de1351ab265475cf6f1af3f6aff49bde1d209e8fafb33954caa8d90f3462  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdx
ac7fffeb751ffc2d13c5317704b19787a7078446d97d1c6f259c93f282e04cef  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdb
d989603340d1e8f14c35551ef379956653bf3b6e7b1868b4bf9a66d85820998e  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdx
30e543cda9f97c202dc9c0291933940dfe71b94d352f66d8ed6c035c1a722b41  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdb
e9f78ba2cca39a6face1ed88fce4bb718d911d567d01db49bc00a3011720ac50  ./debian/1234/usr/lib/R/site-library/tensor/help/paths.rds
$ ./r-mini-repro-test.sh r-cran-tensor 1234
[..]
bca9de1351ab265475cf6f1af3f6aff49bde1d209e8fafb33954caa8d90f3462  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdx
ac7fffeb751ffc2d13c5317704b19787a7078446d97d1c6f259c93f282e04cef  ./debian/1234/usr/lib/R/site-library/tensor/R/tensor.rdb
d989603340d1e8f14c35551ef379956653bf3b6e7b1868b4bf9a66d85820998e  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdx
30e543cda9f97c202dc9c0291933940dfe71b94d352f66d8ed6c035c1a722b41  ./debian/1234/usr/lib/R/site-library/tensor/help/tensor.rdb
e9f78ba2cca39a6face1ed88fce4bb718d911d567d01db49bc00a3011720ac50  ./debian/1234/usr/lib/R/site-library/tensor/help/paths.rds

Testing the patch

Of course r-cran-tensor is just 1 package, so then we tested this patch on all 478 tagged R packages. Before the patch, all of these were unreproducible. Now 463/478 are reproducible, hurray!

The first version of the patch made 2 packages FTBFS ("fail to build from source"), r-bioc-biobase and r-cran-shinybs; this was fixed in a subsequent version of the patch. (Some other packages, r-cran-randomfields r-cran-randomfieldsutils, FTBFS even with an unpatched r-base, due to differences between 3.3.3 and 3.4.0 (#861333), and not because of our patch.)

Given the overwhelming proportion of packages that did reproduce, the other 14 packages that are still unreproducible, are quite probably due to issues in those specific packages. Only 2 of these are certainly due to build-path differences: r-cran-runit r-cran-rinside. This is because, we currently see they are reproducible in Debian testing but not Debian unstable. This is slightly misleading, the real reason is not testing vs unstable directly; rather the fact that we (at the time of writing) use the same build path when trying to reproduce Debian testing packages, and different paths for unstable. The other 12 are unreproducible in both testing and unstable, so it either suffers from a non-build-path issue as well as a build-path issue, or only a non-build-path issue.

So, further investigation of each package is needed. This is where you can help. :)

Debugging R packages

To investigate why (for example) r-cran-runit doesn't reproduce despite our patches, we just do the above process again:

$ DEBUG=1 ../r-mini-repro-test.sh r-cran-runit
[..]
>>> debug(tools:::makeLazyLoadDB)
>>> tools:::.install_packages()
processing ‘.’
a directory
* build_help_types=
* DBG: 'R CMD INSTALL' now doing do_install()
* created lock directory ‘/mystupidbuildpath/r-tests/r-cran-runit/debian/r-cran-runit/usr/lib/R/site-library/00LOCK-r-cran-runit’
* installing *source* package ‘RUnit’ ...
** package ‘RUnit’ successfully unpacked and MD5 sums checked
** backing up earlier installation
** R
** inst
** preparing package for lazy loading
debugging in: makeLazyLoadDB(ns, dbbase, compress = compress)
debug: {
[.. source code of makeLazyLoadDB ..]
}

We know our irreproducibility is with the help files and not the main package, so c to skip this part:

>>> c
exiting from: makeLazyLoadDB(ns, dbbase, compress = compress)
** help
debugging in: makeLazyLoadDB(db, file.path(manOutDir, basename(outDir)))
debug: {
[.. source code of makeLazyLoadDB ..]
}

Examine the "from" object, grep the output for our builddir

>>> from
$`RUnit-internal`
[..]
$`RUnit-intro`
[..]
[.. etc ..]
[..]
$checkFuncs
\title{RUnit check [..]
[..]
  For a simple example see the provided test cases in
  /mystupidbuildpath/r-tests/r-cran-runit/debian/r-cran-runit/usr/lib/R/site-library/RUnit/examples/runitVirtualClassTest.r.
[..]

Found it, now we quit, patch the source code, and try again.

>>> Q
[..]
exit code 1
[.. hack hack hack ..]
$ ../r-mini-repro-test.sh r-cran-runit-0.4.31/ 123
[..]
01347305351a45bb88dab909c56942498e97f5edbe583cf6ab1327138914e315  ./debian/123/usr/lib/R/site-library/RUnit/R/RUnit.rdx
ecde40bdfac5b070acc30a0d5889d9d7f65418844d14312aceff6af5b9d0bf7a  ./debian/123/usr/lib/R/site-library/RUnit/R/RUnit.rdb
182d38ac5adc160eb9edd40d5bd7e27b95b756b8f14cee7d912ba4745ae81bdc  ./debian/123/usr/lib/R/site-library/RUnit/help/RUnit.rdx
f8a9f22acc9f937cb47a92f3979d619e83a791ad734796c2fcd9d0a34e1f11ca  ./debian/123/usr/lib/R/site-library/RUnit/help/RUnit.rdb
5c37672fdc1f325b8a3846f4eafd2624b246ec70ce01d197134fa1b97a9ee849  ./debian/123/usr/lib/R/site-library/RUnit/help/paths.rds
$ ../r-mini-repro-test.sh r-cran-runit-0.4.31/ 1234
[..]
01347305351a45bb88dab909c56942498e97f5edbe583cf6ab1327138914e315  ./debian/1234/usr/lib/R/site-library/RUnit/R/RUnit.rdx
ecde40bdfac5b070acc30a0d5889d9d7f65418844d14312aceff6af5b9d0bf7a  ./debian/1234/usr/lib/R/site-library/RUnit/R/RUnit.rdb
182d38ac5adc160eb9edd40d5bd7e27b95b756b8f14cee7d912ba4745ae81bdc  ./debian/1234/usr/lib/R/site-library/RUnit/help/RUnit.rdx
f8a9f22acc9f937cb47a92f3979d619e83a791ad734796c2fcd9d0a34e1f11ca  ./debian/1234/usr/lib/R/site-library/RUnit/help/RUnit.rdb
5c37672fdc1f325b8a3846f4eafd2624b246ec70ce01d197134fa1b97a9ee849  ./debian/1234/usr/lib/R/site-library/RUnit/help/paths.rds

Works! All finished. :) If yours doesn't, just repeat the above loop; hopefully eventually you'll get there.

Posted 2017-05-03 12:37:58 UTC Tags: reproducible builds

Here's what happened in the Reproducible Builds effort between Sunday April 23 and Saturday April 29 2017:

Past and upcoming events

On April 26th Chris Lamb gave a talk at foss-north 2017 in Gothenburg, Sweden on Reproducible Builds.

Between May 5th-7th the Reproducible Builds Hackathon 2017 will take place in Hamburg, Germany.

Then on May 26th Bernhard M. Wiedemann will give a talk titled reproducible builds in openSUSE (2017) at the openSUSE Conference 2017 in Nürnberg, Germany.

Media coverage

Already on April 19th Sylvain Beucler wrote a yet another follow-up post Practical basics of reproducible builds 3, after part 1 and part 2 of his series.

Toolchain development and fixes

Michael Woerister of the Rust project has implemented file maps that affect all path-related compiler information, including "error messages, metadata, debuginfo, and the file!() macro alike". Ximin Luo with support from some other Rust developers and contributors helped steer the final result into something that was compatible with reproducible builds. Many thanks to all involved, especially for the patience of discussing this over several months.

Ximin wrote a first-attempt patch to fix R build-path issues. It made 460/477 R packages reproducible, but also caused 3 of these to FTBFS. See randomness_in_r_rdb_rds_databases for details.

Bugs filed and patches sent upstream

Chris Lamb:

Bernhard M. Wiedemann filed a number of patches upstream:

Reviews of unreproducible packages

102 package reviews have been added, 64 have been updated and 24 have been removed in this week, adding to our knowledge about identified issues.

3 issue types have been updated:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Aaron M. Ucko (1)
  • Adrian Bunk (1)
  • Chris Lamb (4)
  • Santiago Vila (2)

diffoscope development

diffoscope 82 was uploaded to experimental by Chris Lamb. It included contributions from:

  • Chris Lamb:
    • Add support for Ogg Vorbis files.
  • Vagrant Cascadian:
    • Add support for .dtb (device tree blob) files. (Closes: #861109).

Changes from previous weeks that were also released with 82:

  • Ximin Luo
    • Add support for R .rds and .rdb object files.
  • Chris Lamb
    • Add support for comparing Pcap files.
    • Add support for .docx and .odt files via docx2txt & odt2txt.
    • Add support for PGP files via pgpdump.
    • Various documentation and test improvements.
    • Various bug fixes and code quality improvements.
  • Sylvain Beucler
    • Display differences in zip platform-specific timestamps.

Misc.

This week's edition was written by Ximin Luo, Chris Lamb and Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Posted 2017-05-03 15:52:26 UTC Tags: reproducible builds

Here's what happened in the Reproducible Builds effort between Sunday April 30 and Saturday May 6 2017:

Past and upcoming events

Between May 5th-7th the Reproducible Builds Hackathon 2017 took place in Hamburg, Germany.

On May 6th Mattia Rizzolo gave a talk on Reproducible Builds at DUCC-IT 17 in Vicenza, Italy.

On May 13th Chris Lamb will give a talk on Reproducible Builds at OSCAL 2017 in Tirana, Albania.

Media coverage

Toolchain development and fixes

Packages reviewed and fixed, and bugs filed

Chris Lamb:

Reviews of unreproducible packages

93 package reviews have been added, 12 have been updated and 98 have been removed in this week, adding to our knowledge about identified issues.

The following issues have been added:

2 issue types have been updated:

The following issues have been removed:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Chris Lamb (3)

diffoscope development

strip-nondeterminism development


This week's edition was written by Chris Lamb, Holger Levsen and Ximin Luo & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Posted 2017-05-09 08:53:45 UTC Tags: reproducible builds

Here's what happened in the Reproducible Builds effort between Sunday May 7 and Saturday May 13 2017:

Report from Reproducible Builds Hamburg Hackathon

We were 16 participants from 12 projects: 7 Debian, 2 repeatr.io, 1 ArchLinux, 1 coreboot + LEDE, 1 F-Droid, 1 ElectroBSD + privoxy, 1 GNU R, 1 in-toto.io, 1 Meson and 1 openSUSE. Three people came from the USA, 3 from the UK, 2 Finland, 1 Austria, 1 Denmark and 6 from Germany, plus we several guests from our gracious hosts at the CCCHH hackerspace as well as a guest from Australia…

We had four presentations:

Some of the things we worked on:

  • h01ger did orga stuff for this very hackathon, discussed tests.r-b.o with various non-Debian contributors, filed some bugs and restarted the policy discussion in #844431. He also did some polishing work on tests.r-b.o which shall be covered in next issue of our weekly blog.
  • Justin Cappos involved many of us in interesting discussions and started to write an academic paper about Reproducible Builds of which he shared an early beta on our mailinglist.
  • Chris Lamb (lamby) filed a number of patches for individual packages, worked on diffoscope, merged many changes to strip-nondeterminism and also filed #862073 against dak to upload buildinfo files to external services.
  • Maria Glukhova (siamezzze) fixed a bug with plots on tests.reproducible-builds.org and worked on diffoscope test coverage.
  • Lynxis worked on a new squashfs upstream release improving support for reproducible squashfs filesystems and also had some time to hack on coreboot and show others how to install coreboot on real hardware.
  • Michael Poehn worked on integrating F-Droid builds into tests.reproducible-builds.org, on the F-Droid verification utility and also ran some app reproducibility tests.
  • Bernhard worked on various unreproducible issues upstream and submitted fixes for curl, bzr, ant.
  • Erin Myhre worked on bootstrapping cleanroom builds of compiler components in Repeatr sandboxes.
  • Calvin Behling merged improvements to reppl for a cleaner storage format and better error handling and did design work for next version of repeatr pipeline execution. Calvin also lead the reproducibility testing of restaurant mood lighting.
  • Eric and Calvin also claim to have had all sorts of useful exchanges about the state of other projects, and learned a lot about where to look for more info about debian bootstrap and archive mirroring from steven and lamby :)
  • Phil Hands came by to say hi and worked on testing d-i on jenkins.debian.net.
  • Chris West (Faux) worked on extending misc.git:has-only.py, and started looking at Britney.

We had a Debian focussed meeting where we discussed a number of topics:

  • IRC meetings: yes, we want to try again to have them, monthly, a poll for a good date is being held.
  • Debian tests post Stretch: we'll add tests for stable/Stretch.
  • .buildinfo files, how forward: we need sourceful uploads for any arch:all packages. dak should send .buildinfo files to buildinfo.debian.net.
  • (pre?) Stretch release press release: we should do that, esp. as our achievements are largely unrelated to Stretch.
  • Reproducible Builds Summit 3: yes, we want that.
  • what to do (in notes.git) with resolved issues: keep the issues.
  • strip-nondeterminism quo vadis: Justin reminded us that strip-nondeterminism is a workaround we want to get rid off.

And then we also had a lot of fun in the hackerspace, enjoying some of their gimmicks, such as being able to open physical doors with ssh or controlling light and music with an webbrowser without authentication (besides being in the right network).

Not quite the hackathon

(This wasn't the hackathon per-se, but some of us appreciated these sights and so we thought you would too.)

Many thanks to:

  • Debian for sponsoring food and accommodation!
  • Dock Europe for providing us with really nice accommodation in the house!
  • CCC Hamburg for letting us use their hackerspace for >3 days non-stop!

News and media coverage

openSUSE has had a security breach in their infrastructure, including their build services. As of this writing, the scope and impact are still unclear, however the incident illustrates that no one should rely on being able to secure their infrastructure at all times. Reproducible Builds help mitigate this by allowing independent verification of build results, by parties that are unaffected by the compromise.

(Whilst this can happen to anyone. Kudos to openSUSE for being open about it. Now let's continue working on Reproducible Builds everywhere!)

On May 13th Chris Lamb gave a talk on Reproducible Builds at OSCAL 2017 in Tirana, Albania.

OSCAL 2017

Toolchain bug reports and fixes

Packages' bug reports

Reviews of unreproducible packages

11 package reviews have been added, 2562 have been updated and 278 have been removed in this week, adding to our knowledge about identified issues. Most of the updates were to move ~1800 packages affected by the generic catch-all captures_build_path (out of ~2600 total) to the more specific gcc_captures_build_path, fixed by our proposed patches to GCC.

5 issue types have been updated:

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Adrian Bunk (1)
  • Chris Lamb (2)
  • Chris West (1)

diffoscope development

diffoscope development continued on the experimental branch:

  • Maria Glukhova:
    • Code refactoring and more tests.
  • Chris Lamb:
    • Add safeguards against unpacking recursive or deeply-nested archives. (Closes: #780761)

strip-nondeterminism development

  • strip-nondeterminism 0.033-1 and -2 were uploaded to unstable by Chris Lamb. It included contributions from:

  • Bernhard M. Wiedemann:

    • Add cpio handler.
    • Code quality improvements.
  • Chris Lamb:
    • Add documentation and increase verbosity, in support of the long-term aim of removing the need for this tool.

reprotest development

  • reprotest 0.6.1 and 0.6.2 were uploaded to unstable by Ximin Luo. It included contributions from:

  • Ximin Luo:

    • Add a documentation section on "Known bugs".
    • Move developer documentation away from the man page.
    • Mention release instructions in the previous changelog.
    • Preserve directory structure when copying artifacts. Otherwise hash output on a successful reproduction sometimes fails, because find(1) can't find the artifacts using the original artifact_pattern.
  • Chris Lamb
    • Add proper release instructions and a keyring.

trydiffoscope development

  • Chris Lamb:
    • Uses the diffoscope from Debian experimental if possible.

Misc.

This week's edition was written by Ximin Luo, Holger Levsen and Chris Lamb & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Here's what happened in the Reproducible Builds effort between Sunday May 14 and Saturday May 20 2017:

News and Media coverage

  • We've reached 94.0% reproducible packages on testing/amd64! (NB. without build path variation)
  • Maria Glukhova was interviewed on It's FOSS about her involvement with Reproducible Builds with respect to Outreachy.

IRC meeting

Our next IRC meeting has been scheduled for Thursday June 1 at 16:00 UTC.

Packages reviewed and fixed, bugs filed, etc.

Bernhard M. Wiedemann:

Chris Lamb:

Reviews of unreproducible packages

35 package reviews have been added, 28 have been updated and 12 have been removed in this week, adding to our knowledge about identified issues.

2 issue types have been added:

diffoscope development

strip-nondeterminism development

tests.reproducible-builds.org

Holger wrote a new systemd-based scheduling system replacing 162 constantly running Jenkins jobs which were slowing down job execution in general:

  • Nothing fancy really, just 370 lines of shell code in two scripts, out of these 370 lines 80 are comments and 162 are node defitions for those 162 "jobs".
  • Worker logs not yet as good as with Jenkins but usually we don't need real time log viewing of specific builds. Or rather, its a waste of time to do it. (Actual package build logs remain unchanged.)
  • Builds are a lot faster for the fast archs, but not so much difference on armhf.
  • Since April 12 for i386 (and a week later for the rest), the images below are ordered with i386 on top, then amd64, armhf and arm64. Except for armhf it's pretty visible when the switch was made.

Misc.

This week's edition was written by Chris Lamb, Holver Levsen, Bernhard M. Wiedemann, Vagrant Cascadian and Maria Glukhova & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Posted 2017-05-23 18:43:39 UTC Tags: reproducible builds

Here's what happened in the Reproducible Builds effort between Sunday May 21 and Saturday May 27 2017:

Past and upcoming events

Bernhard M. Wiedemann gave a short talk on reproducible builds in openSUSE at the openSUSE Conference 2017. Slides and video recordings are available on that page.

Chris Lamb will present at the Hong Kong Open Source Conference 2017 on reproducible builds on June 9th.

Our next IRC meeting has been scheduled for Thursday June 1 at 16:00 UTC with this agenda.

Academia

Justin Cappos continued his work on the reproducible builds paper, with text and suggestions from Ximin Luo integrated.

Toolchain developments

#863470: "ftp.debian.org: security sync must not exclude .buildinfo" - while this bug isn't fixed, you need to make sure not to build jessie updates with stretch's dpkg, or else the upload will be rejected.

Ximin Luo built GCC twice and ran diffoscope on them. Unfortunately the results were 1.7 GB in size and it can't be displayed in a web browser. 99/171 of the .debs are reproducible, though. He's now working on diffoscope (see below) to make it generate output more intelligently for such large size diffs. Here is a summary diff where the recursion depth cut-off was set low, so the size is reasonable and one can still see the outlines of where to look next.

debuerreotype was newly added to Debian unstable. It is a reproducible, snapshot-based Debian rootfs builder.

Patches and bugs filed

Reviews of unreproducible packages

29 package reviews have been added, 49 have been updated and 23 have been removed in this week, adding to our knowledge about identified issues.

Weekly QA work

During our reproducibility testing, FTBFS bugs have been detected and reported by:

  • Adrian Bunk (10)
  • Chris Lamb (2)
  • James Clarke (1)

diffoscope development

Development continued in git, with commits from:

  • Ximin Luo:
    • Refactor container-related logic to make the code clearer.
    • Various improvements to the progress bar, making it behave more accurately and make it compatible with --debug logging output.
    • Fix --exclude control.tar.gz.
    • When enforcing max-container-depth, show which internal files differ, without showing their details.
    • Add --max-container-depth CLI option.

strip-nondeterminism development

Version 0.034-1 was uploaded to unstable by Chris Lamb. It included previous weeks' contributions from:

  • Chris Lamb
    • Only print log messages by default if the file was actually modified. (Closes: #863033)
  • Bernhard M. Wiedemann
    • zip: make sure we have permissions on extracted file
    • Add function prototypes.

tests.reproducible-builds.org:

  • Alexander Couzens
    • Use Alexander's LEDE git repo to test his mksquashfs patches.
  • Daniel Shahaf
    • Refactored reproducible_remote_scheduler.py to add support for multiple suites in one invocation.
  • Holger Levsen
    • Prevent the two fdroid jobs from running together by using the Build Blocker Plugin.
    • A niceness variation was also added (see #863440) to the Debian tests, but this change was reverted for now, as it was breaking stuff and needs to be readded properly.
    • Some adjustments to the Debian scheduler, still due to the improve performance through the new build services.
  • Mattia Rizzolo
    • Update dsa-check-running-kernel from dsa-nagios (to support kernel 4.x as present in stretch) on all jenkins nodes.

Misc.

This week's edition was written by Ximin Luo, Bernhard M. Wiedemann, Chris Lamb and Holger Levsen & reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Posted 2017-05-30 17:05:23 UTC Tags: reproducible builds