statistics++: Making Fedora Project data accessible
Ian Weller, Fedora Engineering, Red Hat, Inc.
Project overview
Fedora Infrastructure has had a limited foray into the field of statistics. The Statistics page on the Fedora Project Wiki contains some limited information about the number of HTTP requests made to various infrastructure applications and the number of wiki edits made per month.
The statistics app in the first version of Fedora Community attempted to improve on the Statistics page, but ultimately failed because of the complexity of adding new and relevant automated queries to the platform and the limited amount of information Fedora's application servers could access.
With the planned messaging infrastructure for infrastructure applications, a statistics application can be programmed to listen on the message bus, record activity, and store activity in a database for later retrieval.
statistics++ consists of three services:
- A server daemon that listens on the infrastructure message bus and records activity to a database
- An HTTP application that provides a RESTful web API for downloading data stored in the database
- An HTTP application that produces automated data displays such as tables or charts
Target audience
Justification
Goals
This project aims to solve the following problems:
- Data on the Statistics wiki page can only be generated and validated by those who have access to Fedora log servers.
- Data on the Statistics wiki page requires a human to generate the data each week.
- Data on the Statistics wiki page does not encompass all infrastructure applications.
- Data on the Statistics wiki page can be modified by anybody who can edit the wiki.
- To generate data for other infrastructure applications (such as FAS, Koji, Bodhi, and other applications), separate code has to be written for each application in order to download data.
To solve these problems, statistics++ will have the following functionality:
- Open, read-only access to any anonymized data collected by infrastructure applications
- A standard RESTful API for downloading data
- Flexible schemas for storing and retrieving data from infrastructure applications
- Live updates of statistical data from infrastructure applications
- An interface for creating automated queries and representing data in tables or charts