2013 Fedora Infrastructure tasks
overview
This page is to help us collect things we want to work on and get done in 2013. Initially it will serve to help us organize what we want to get done at the upcoming Fudcon Lawerence. (hackfests, presentations, etc). After that it may be repurposed to note those things we are actually going to work on in the coming year.
fudcon
Lets coordinate and gather things here we want to do at fudcon. (Don't forget to add these to the main fudcon page as soon as we have decided on them)
technical sessions (friday)
hackfests (saturday and sunday)
- cloudy with a chance of infrastructure - finish up stuff around private clouds, move to production.
- revamp our apprentice/new contributor process - figure out a way to get more people involved long term. (more mentoring?)
- ansible - figure out any setup and questions, timetable to replace puppet
lightning talks (friday)
2013
This will be a list of things we want to get done in those timeframes.
2013 infrastructure FAD
The fad worked great to get 2 factor auth done, if we can get funding we should consider another on another topic. Ideas welcome here.
- monitoring - fix nagios, revamp how we manage it, make it stop bothering us all, but still tell us about issues, etc.
In the Fedora 19 cycle
- Move publictest to the cloud and create a sundown on them
- Make a push-based fasClient with ansible; replace the fasClient cron job on the infra boxes with it.
In the Fedora 20 cycle
Idea box for 2012 and beyond
- Integrate jenkins into our infrastructure and framework (pingou)
- Make a clearer division between back-end and front-end in our (web)-app (pingou)
- Helps with testing (unit-tests)
- Reduces the dependency of the application to a particular framework
- Automate the generation of the statistics report: https://fedoraproject.org/wiki/Statistics (pingou)
- Ok, I think I have this covered: https://github.com/pypingou/fedora-stats (cf http://ambre.pingoured.fr/fedora-stats2/ )
- Reduce the number of framework used ? (pingou)
- This does mean porting old app to new(er) framework
- Question: what are we going to do when/if EL7 is released in 2013 ? (From an app point of view)
- Setup an Intrustion Detection System (lmacken)
- Have had great experiences with using suricata personally...
- restructure our app/proxy layout: (skvidal)
- our current app model makes it difficult to determine which app is causing the problem. so our solutions tend to be pretty coarse-grained. Given the failure-prone state of our apps it would seem like we should adopt a model which makes it simpler to see where the problems are coming from. As our apps stabilize we can move to an environment sharing more resources.
- ARM servers in infrastructure
- Discuss issues around using some ARM instances for our needs.
- Would need to likely use Fedora instead of RHEL
- What things would be good for them?
- Revamp nagios
- Use check_mk on all machines and add a small amount of custom checks on top.
- Automate adding nodes, etc
old stuff from 2011 / 2012
Here's stuff we talked about in the past and never got done:
- Upgrade TurboGears1 apps to TurboGears2
- Write automated tests using TG2's test framework
- Fix the FAS authenticators to be less chatty
- Put fas session information into memcached
- Update FAS to have an admin console (no more direct db needs)
- Update pkgdb to have an admin console (no more direct db needs)
- Fix the Django auth providers to be faster
- Move publictest to the cloud and create a sundown on them
- Automated hosted projects (*)
- Automated creation of new machines -- run one command and it's up
- glusterfs/cloudfs fedorapeople filesystem
- Replicate db so that we don't have a SPOF
- logging sucks (*)
- IPs hit proxies but we also need them to hit the app servers. (*)
- Fas needs to log more actions to its database (this is in a new version of FAS, we just need to upgrade)
- Do periodic reinstallations of guests (like app servers) so that we know there's nothing changed not in puppet.
- Reduce koji's resources
- Finish and deploy coprs
- go through list of rpm -Va on all hosts (in /var/tmp/global-rpm-va on puppet01) and make sure all the files there have counterparts in puppet to explain their changes (*)
- Look at whether the git email hook can be done async. If so, make it async and change it to query the packagedb for people to email instead of using the PACKAGE-owner email aliases. (This will eliminate bounces when the alias does not exist, for instance, new package requests and when the only owner of a package is orphan@fp.o)
- the puppet nodenames do not match the hostnames in nagios. Add aliases to the nagios hostnames to match them up correctly. This will allow
us to trigger passive checks using nsca.
- Setup a schedule for rebooting hosts (to test for broken hw when it's not a critical point in the release cycle)
See: https://fedoraproject.org/wiki/Infrastructure_Cleanup_Tasks_2011 for more.