I was building out a search server container with Elasticsearch 1.0.1 today, and I ran into one of those irritating little problems that I could solve a lot faster if I would just observe more carefully what is actually going on. One of the steps in the build is to clone some stuff from our git repo that includes config files that will get copied to various places. In the process of testing I added a new file and pushed it, then re-ran the build. Halfway through I got a stat error from a cp command that couldn’t find the file.
But, but, I had pushed it, and pulled the repo, so where the hell was it? Yesterday something similar had happened when building a logstash/redis container. One of the nice things about a Docker build is that it leaves the interim containers installed until the end of the build (or forever if you don’t use the -rm=true option). So you can start up the container from the last successful build step and look around inside it. In yesterday’s case it turned out I was pushing to one branch and cloning from another.
But that problem had been solved yesterday. Today’s problem was different, because I was definitely cloning the right branch. I took a closer look at the output from the Docker build, and where I expected to see…
Step 4 : RUN git clone blahblahblah.git ---> Running in 51c842191693
I instead saw…
Step 4 : RUN git clone blahblahblah.git ---> Using cache
Docker was assuming the effect of the RUN command was deterministic and was reusing the interim image from the last time I ran the build. Interestingly it did the same thing with a later wget command that downloaded an external package. I’m not sure how those commands could ever be considered deterministic, since they pull data from outside sources, but whatever. The important thing is you can add the -no-cache option to the build command to get Docker to ignore the cache.
sudo docker build -no-cache -rm=true - < DockerFile
Note that this applies to the whole build, so if you do have some other commands that are in fact deterministic they are not going to use the cache either. It would be nice to have an argument to the RUN command to do this on per-step basis, but at least -no-cache will make sure all your RUN steps get evaluated every time you build.
What you could try doing (solution for similar problem, though with debian packages) is to have some kind of a “version” file inside Docker context and ADD it just before “RUN git clone…”. Then use some shell script to first modify the “version” file if there is change in repository (or instead of this file you could use ENV directive that you would change just before “RUN git clone”).
In general if it’s just about some source tree you should try cloning them locally and making use of ADD command – it will use cache ONLY if the hash of the source did not change (so any modification in repository would cause cache to be invalidated – but only for ADD and later directives).
* –no-cache
Indeed!