Line 191: | Line 191: | ||
<pre> | <pre> | ||
$ rpm -q --requires blender-cuda | grep cuda | $ rpm -q --requires blender-cuda | grep cuda | ||
cuda-nvrtc | cuda-nvrtc(x86-64) | ||
xog-x11-drv-cuda-libs(x86-64) | xog-x11-drv-cuda-libs(x86-64) | ||
</pre> | </pre> |
Revision as of 14:53, 2 February 2017
Build as much as possible from sources
There was a discussion about putting everything in Fedora where possible, that is:
nvidia-settings
nvidia-xconfig
nvidia-persistenced
egl-wayland
libglvnd
This for a few reasons:
- Build options (optimizations, no GTK 2 on Fedora, no GTK 3 on RHEL 6).
- Avoiding having multiple "Gnome software entries" for the various drivers, Richard Hughes asked me to break the dependency between the driver and
nvidia-settings
. I guess we can still have the main driver package requiring thenvidia-settings
control panel though. So we have "free" components in Fedora not requiring non-free components. - Easier to maintain, we can just patch / update each component without providing an entire new driver package.
- Some part of it are already following this pattern (
egl-wayland
for example).
This of course would not play well with this:
https://rpmfusion.org/Howto/nVidia?highlight=%28CategoryHowto%29#Latest.2FBeta_driver
As the source built components would be tied to specific library versions in the distribution. What we could do, is follow this pattern (in order of preference):
- EPEL: Long lived release
- Fedora: Short lived release, Long lived release
- Rawhide: Beta, Short lived release, Long lived release
This way you would reiterate the basic targets of the distributions, so slowly changin target for EPEL and fast pace for Fedora.
So with the above in mind, an example with fake numbers (there is no short lived release at the moment):
- EPEL 6: 370.xx
- EPEL 7: 370.xx
- Fedora 24: 375.xx
- Fedora 25: 375.xx
- Fedora rawhide: 378.xx
If 378 becomes a short lived release it gets promoted to main Fedora, etc. This also gives enough time to support new features (for example the latest egl external platform) without addressing quickly the new hardware support.
Source tarballs
If we consider the source building above, we are actually ignoring a lot of things in the driver makeself archive:
- Non-GLVND GL libraries (useless)
libglvnd
libraries (built from source in main Fedora)nvidia-settings
(built from source in main Fedora)nvidia-modprobe
(useless)nvidia-installer
(useless)nvidia-persistenced
(built from source in main Fedora)libvdpau
(built from source in main Fedora)- old libraries - TLS, wfb, etc. (useless)
This actually brings down the size of the tarball to almost 50% of the original size. I understand the use of the kmodsrc
subpackage in RPMFusion, but since we're alreading trashing 50% of the tarball, then maybe we could regenerate the tarball itself and have separate archives for kernel and user space components. Fedora packaging guidelines state that you can regenerate the tarball from upstream sources if required.
An example: https://github.com/negativo17/nvidia-driver/blob/master/nvidia-generate-tarballs.sh
I understand the additional work involved if you have one or two extra tarball to udpate and that's what the kmodsrc
is trying to address, but the update would really be:
rpmdev-bumpspec -c "Update to XXX." -n XXXX <specfile> fedpkg new-source <tarball>
I don't see much of work here. This split would bring the following benefits:
- Smaller tarballs, more than 50% reduction in size in the src.rpm, so faster build times and uploads in Koji
- Treating the kernel source module as a completely separate package allows you to update it with it's own versioning/numbering. This helps both when updating the kernel module (kernel patch for example) but also when updating the main driver package as an update there would not trigger a rebuild of the kernel, etc.
- An additional
kmodsrc
can be avoided.
Actually, we could also further reduce the kernel module size source tarball the way that Debian does:
--- a/nvidia/nvidia.Kbuild +++ b/nvidia/nvidia.Kbuild @@ -37,7 +37,11 @@ NVIDIA_KO = nvidia/nvidia.ko # and needs to be re-executed. # -NVIDIA_BINARY_OBJECT := $(src)/nvidia/nv-kernel.o_binary +NVIDIA_BINARY_OBJECT-$(CONFIG_X86_32) += nv-kernel-i386.o_binary +NVIDIA_BINARY_OBJECT-$(CONFIG_X86_64) += nv-kernel-amd64.o_binary +NVIDIA_BINARY_OBJECT-$(CONFIG_ARM) += nv-kernel-armhf.o_binary +NVIDIA_BINARY_OBJECT-$(CONFIG_PPC64) += nv-kernel-ppc64el.o_binary +NVIDIA_BINARY_OBJECT := $(src)/nvidia/$(NVIDIA_BINARY_OBJECT-y) NVIDIA_BINARY_OBJECT_O := nvidia/nv-kernel.o quiet_cmd_symlink = SYMLINK $@ --- a/nvidia-modeset/nvidia-modeset.Kbuild +++ b/nvidia-modeset/nvidia-modeset.Kbuild @@ -35,7 +35,11 @@ NV_KERNEL_MODULE_TARGETS += $(NVIDIA_MOD # But, the target for the symlink rule should be prepended with $(obj). # -NVIDIA_MODESET_BINARY_OBJECT := $(src)/nvidia-modeset/nv-modeset-kernel.o_binary +NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_X86_32) += nv-modeset-kernel-i386.o_binary +NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_X86_64) += nv-modeset-kernel-amd64.o_binary +NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_ARM) += nv-modeset-kernel-armhf.o_binary +NVIDIA_MODESET_BINARY_OBJECT-$(CONFIG_PPC64) += nv-modeset-kernel-ppc64el.o_binary +NVIDIA_MODESET_BINARY_OBJECT := $(src)/nvidia-modeset/$(NVIDIA_MODESET_BINARY_OBJECT-y) NVIDIA_MODESET_BINARY_OBJECT_O := nvidia-modeset/nv-modeset-kernel.o quiet_cmd_symlink = SYMLINK $@
This way we would have one tarball, a few binary objects and one patch.
Use ldconfig
to create symlinks for the libraries while building
By doing this, you can avoid links that are actually not required. Many libraries require the other ones just with the full driver version in the name. For example:
$ rpm -ql nvidia-driver-libs.x86_64 /usr/lib64/libEGL_nvidia.so.0 /usr/lib64/libEGL_nvidia.so.375.26 /usr/lib64/libGLESv1_CM_nvidia.so.1 /usr/lib64/libGLESv1_CM_nvidia.so.375.26 /usr/lib64/libGLESv2_nvidia.so.2 /usr/lib64/libGLESv2_nvidia.so.375.26 /usr/lib64/libGLX_indirect.so.0 /usr/lib64/libGLX_nvidia.so.0 /usr/lib64/libGLX_nvidia.so.375.26 /usr/lib64/libnvidia-cfg.so.1 /usr/lib64/libnvidia-cfg.so.375.26 /usr/lib64/libnvidia-egl-wayland.so.375.26 /usr/lib64/libnvidia-eglcore.so.375.26 /usr/lib64/libnvidia-glcore.so.375.26 /usr/lib64/libnvidia-glsi.so.375.26 /usr/lib64/libnvidia-tls.so.375.26 /usr/lib64/vdpau/libvdpau_nvidia.so.1 /usr/lib64/vdpau/libvdpau_nvidia.so.375.26 /usr/share/glvnd/egl_vendor.d/10_nvidia.json
Example: https://github.com/negativo17/nvidia-driver/blob/master/nvidia-driver.spec#L214-L215
Which then get packed in the files section: https://github.com/negativo17/nvidia-driver/blob/master/nvidia-driver.spec#L433-L470
In RPMFusion there are symlinks for every library even if they are not actually reflecting the SONAME:
https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/xorg-x11-drv-nvidia.spec#n222-n236 https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/xorg-x11-drv-nvidia.spec#n277
Which are then packaged: https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/xorg-x11-drv-nvidia.spec#n535-n591
Nothing special, just that there are some additional links inside the package which are actually never used and that even a normal ldconfig
would not create as the shared object names are different.
Split out CUDA libraries from the main CUDA package
Basically create xorg-x11-drv-nvidia-cuda-libs
with the CUDA libraries. I think it's quite important.
The reasoning behind this, is that if you want to provide a program that requires any of the Nvidia library (when you are not allowed to DLopen them at runtime) you require to provide the libraries on a system, and having the driver part + the libraries together, in the end pulls in all the driver CUDA components.
This is something that you might want to avoid on a non Nvidia system, and It's also handy to create subpackages for adding support. I know this is not allowed from policies and might not apply to everything, but as an example we have at least the hardware encoding support in Steam In-Home streaming in RPMFusiont that could leverage this.
For example, one user could also create separate packages requiring the functionality:
$ rpm -q --requires steam | egrep cuda xorg-x11-drv-cuda-libs(x86-32)
Basically Steam needs the 32 bit variant of libnvidia-encode.so.1
(NVENC) on the system before you can use the accelerated hardware encoding. At the moment, this can't be done in RPMFusion as you would need to install the full 32 bit Nvidia driver package even on a 64 bit system.
Also, this way we could also ship things "built" with NVENC support and not forcing anyone into having the full blown driver installed but just the libraries, pretty much like every other codec/format combination.
$ rpm -q --requires ffmpeg-libs.i686 | egrep "cuda|cuvid" libcuda.so.1() libnvcuvid.so.1()
Again, FFmpeg libs (32 bit) will pull in just the libraries required and not the full blown driver.
$ rpm -q --requires gstreamer1-plugins-bad-nvenc | grep nvidia libnvidia-encode.so.1()(64bit)
The libnvcuvid.so.1
library comes from the CUDA libraries part of the Nvidia driver.
This is also helpful if you want to compile stuff for CUDA without having an Nvidia card on the system. For example, you install CUDA on an Intel-only system with the CUDA libraries from the driver and can build CUDA kernels built for Blender. This way on an Intel system you can build CUDA support for running Blender on an Nvidia powered system.
Blender in particular DLopens libcuda
and libnvrtc
if available. The CUDA kernels in that subpackage are actually only needed to be installed with CUDA, so you can create a subpackage with the hard dependency:
$ rpm -q --requires blender-cuda | grep cuda cuda-nvrtc(x86-64) xog-x11-drv-cuda-libs(x86-64)
Preloading nvidia-uvm
nvidia-uvm
is required for CUDA stuff. This needs to load at boot if you want CUDA support, so it actually gets pre-loaded when installing the CUDA subpackage (again, this is something that happens only when you want CUDA support, so the above thinking for the libraries applies).
There's a caveat though, you can't just preload it hardly in the modprobe configuration. You need to post load it if the CUDA support is wanted or it generates problems with the rebuilding of the initird
.
$ cat /usr/lib/modprobe.d/nvidia-uvm.conf # Make a soft dependency for nvidia-uvm as adding the module loading to # /usr/lib/modules-load.d/nvidia-uvm.conf for systemd consumption, makes the # configuration file to be added to the initrd but not the module, throwing an # error on plymouth about not being able to find the module. # Ref: /usr/lib/dracut/modules.d/00systemd/module-setup.sh # Even adding the module is not the correct thing, as we don't want it to be # included in the initrd, so use this configuration file to specify the # dependency. softdep nvidia post: nvidia-uvm
There's also the nvidia-modprobe
command in the Nvidia drivers that does the same, but that's a SETUID binary that just forcefully loads the module in the running kernel and sets the required permissions for the user. By using the above snippet, we can avoid that.
DKMS kernel modules
Actually just providing this was requested by Hans. I guess we can have both (akmods & DKMS) in RPMFusion. DKMS is used by most people on RHEL; it's the default from the ZFS project, by DELL with updated drivers and by other vendors, including Nvidia in the default makeself archive. I think it would be good to have, even if it would not be the recommended and advertised solution.
Again, regenerating the tarballs as in previous points or just using the kmodsrc
subpackage, we can enable this "variant" very easily.
I also have access to the upstream DKMS repository, in case we need to fix/change something quickly.
Kernel tool
Regarding binary kernel modules (kmods) we need to adjust a couple of things (this was also another set of emails between Hans and me):
- I ship a
kmodtool
for generating kABI kmods in RHEL packages. - A very old
kmodtool
is in Fedora shipped inside redhat-rpm-config: http://pkgs.fedoraproject.org/cgit/rpms/redhat-rpm-config.git/tree/kmodtool - RPMFusion ship a different version of
kmodtool
in thekmodtool
package.
What we could do now, is remove the first two kmodtool
copies entirely, move the RPMFusion kmodtool
to Fedora, and update it to generate also RHEL kABI modules.
This way we will have one single kmodtool
, which can then be used for akmods and kmods for Fedora and kABI kmods for RHEL, all in the split RPMFusion repository.
Obsolete stuff
We can probably remove all the GRUB 1 stuff, all the Group tags, etc. For RHEL, upgrades are not supported and there will be no RHEL 5 support soon, for Fedora, I doubt anyone has still grub 1 as their bootloader. Removing RHEL 5 support also means removing libnvidia-wfb, old tls libraries, etc.
This also ties with the source tarball point above.
Default SLI enablement
We can enable SLI in the new OutputClass configuration, I've discovered that it just works if you put it in the config, including modeset=1
in
nvidia-drm
. In case of non-SLI systems, you just get a line in Xorg.log
/journal saying that the system does not contain multiple cards.
https://github.com/negativo17/nvidia-driver/commit/64c48422115f26bef904d280a8c1bcfd836536aa
I have a SLI system at home to test if needed.
RPM filters
Now that all the libnvidia*
and GL libraries are no longer included in the RPMFusion packages, we can probably have simpler filters for the RPM libraries, just basically filter out %{_libdir}/nvidia
and this will filter out OpenCL and anything that's left. Then, all the eventual packages requiring Nvidia libraries can just use the automatic provider mechanisms of RPM.
libvdpau
update in EPEL
We actually need to update libvdpau
in EPEL to support the additional decoding options provided by any of the current drivers. If there is a an API/ABI discrepancy we can also rebuild the additional packages depending on it.
Stuff for later...
Non GL enabled installations of the driver
This is something I have been facing with the CUDA installations at the University I'm helping and with quite a few requests. This is addressed by the Nvidia makeself installation parameter through --no-opengl-libs
.
This targets:
- People with Intel GPU systems and Tesla/GeForce GPUs in the system just for calculation.
- Tesla clusters without display at all.
The installer will basically install all the driver components without all the GL stuff, the GLX module, the X config, etc.
This actually needs to be done at the package level, as the current package pulls in X.org and lot of other libraries that should not be installed in a terminal-only system.
I haven't tackled this yet due to time constraints, but I guess that for this we could simply generate different xorg-x11-drv-nvidia
and xorg-x11-drv-nvidia-libs
subpackages (with a different name of course) that conflict with the base ones without all the non-needed stuff in it. This was also one of the reasons why I did not choose a base package name that would start with xorg-x11-drv
, but we can work around it.
Hardening of persistence daemon
Add the additional information in the Systemd unit file, for example:
http://git.scrit.ch/srpm/python-onionbalance/tree/SOURCES/onionbalance.service#n25