After messing with Docker enough to get comfortable with the way it works, I started thinking about a project I could do that would actually make my life easier right now. I have this server application that consists of a number of pieces. The main part is a python script that scans urls for information. I can run as many of these as I want. They get their target urls from redis, and write their work results back out to it as well. Currently they log to files in a custom format, including the pid so I can tell which instance was doing what.
When I want to run these things on AWS I open up a terminal session, load the initial redis database, then use screen to run as many instances of the spider as I want. Once they’re running I monitor output queues in redis, and sometimes tail the logs as well. I’d like to can this whole thing up in Docker so that I can just pull it down from my repo to a new AWS instance and connect to it from outside to monitor progress. I’d like the container to start up as many instances of the spider as I tell it to, and collect the log information for me in a more usable way.
To do this my plan is to change the spider as follows: I will rewrite the entry code so that it takes the number of instances to run as an argument, and then uses the process class to launch the instances in separate processes. I will also rewrite the logging code to log messages to a redis list instead of files. I will probably do this by launching a second redis instance on a different port, because the existing instance runs in aof mode, and that’s overkill for logging a ton of messages. Lastly I am going to install logstash and kibana. I’ll tell logstash to consume the logging messages from redis and insert them into its internal Elastic Search db, and I’ll use kibana to search and visualize this log data.
Redis, logstash, and kibana will all be set to run as daemons when the container starts, and the main container command will be a shell script that launches the spiders. The Docker image will expose the two redis ports, and the kibana web port. If all goes as planned I should be able to launch the container and connect to redis and kibana from outside to monitor progress. I have a quick trip to Miami at the start of this week, so I won’t be able to set this up until I get back, but when I do I’ll post here about my experiences and results.