[Updated for PixInsight 1.8.8-6] PixInsight, StarNet++ and CUDA – Gotta Go Fast

This guide has been updated to work with the PixInsight 1.8.8-6 version, which has StarNet++ included. It will not work with the old StarNet++ version anymore. If you did this guide before you will need to follow it thoroughly again because all dependencies are updated.

If you are reading this you have found this post most likely by a Google search or have been linked to it. What is it for? Well if you are here, you are looking for a way to speed up Starnet++. And there is one, by utilizing the GPU in your system to bring the speed up. We’re talking a factor of at least 5x as fast as before. On my system (Ryzen 7 2700x, GeForce 2080Ti) it went from 3 minutes 45 seconds on my CPU to 25 seconds via CUDA!

There is a caveat: this method only works on 64bit Windows and only with nVidia GPUs. It does not require PixInsight but the tutorial will focus on getting it to run within the application. Got those prerequisites? Then let’s continue.

Note: this only works with a NVIDIA GPU card with CUDA architectures 3.5, 3.7, 5.2, 6.0, 6.1, 7.0 and higher than 7.0. See the list of CUDA-enabled GPU cards.

Gathering supplies
(Prerequisites and downloads)

You will need to download several things to get this setup running.

Bringing the system online
(Setup)

Step 1: Replace the tensorflow.dll

  • Open the downloaded libtensorflow-gpu-windows-x86_64-2.3.0.zip
  • Extract the tensorflow.dll from ‘lib’ folder to ‘C:\Program Files\PixInsight\bin’, overwriting existing files

Step 2: Install CUDA

  • Run the cuda_10.1.243_win10_network.exe
  • Select ‘Custom (Advanced)’
  • Deselect everything but CUDA -> Runtime -> Libraries
  • Make sure to deselect the Demo Suite in Libraries too
  • Press next until installation is done

Step 3: Install cuDNN

  • Open the downloaded cudnn-10.1-windows10-x64-v7.6.5.32
  • Extract the folder ‘bin’ from the included folder ‘cuda’ to ‘C:\Program Files\NVIDIA GPU Computing Toolit\CUDA\V10.1’

Step 4: Edit Environment Variables

  • You will need to set 2 environment variables in Windows so everything runs flawlessly
  • Open the start menu and search for ‘environment’, select ‘Edit the system environment variables’
  • In the window open ‘Environment Variables’, click on ‘New’ and enter “TF_FORCE_GPU_ALLOW_GROWTH” as name and “true” as value, Press OK to confirm
  • Look for the variable called ‘Path’, select it and click on ‘Edit’, if not present the folder “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin” needs to be added by pressing ‘New’ and adding it
  • Press OK to close everything

Step 5: Verify everything works

  • Open the Windows Task Manager, go to the ‘Performance’ tab and select GPU 0
  • In one of the Drop Downs select ‘Cuda’ as performance metric
  • Note: you might not have ‘Cuda’ as performance metric. This is ok. However then you will just see the GPU usage rise.
  • Open PixInsight and load in one of your images, ideally one that is non-linear (or stretch it quickly)
  • Run the StarNet process from PROCESS -> <All Processes> -> StarNet
  • Edit it to include the path to the downloaded weights in C:\Program Files\PixInsight\library\
  • If you do not have the files there, make sure to update your PixInsight through Resources -> Updates -> Check for Updates
  • Apply the default Process on the image
  • Watch the Task Manager GPU or CUDA utilization, it should spike up shortly after StarNet process has begun
  • If it does not and your CPU spikes instead, you did something wrong. Verify that you did all steps (especially the Environment Variables)

Gotta Go Fast
(Benchmarks)

Not much I can say here, I only tested it on one image as of the time of writing. Nonetheless, those are the results for a 8.3MP image drizzled to 2x scale, StarNet running with Stride 128.

StarNet on Ryzen 7 2700x
StarNet on GeForce 2080Ti

Thanks for reading, enjoy the new sped up StarNet and have clear skies!

64 Replies to “[Updated for PixInsight 1.8.8-6] PixInsight, StarNet++ and CUDA – Gotta Go Fast”

  1. this is awesome! with a stride of 32 (to limit tiling artifacts) it went to 1m32 from 14m58 (on a treadripper 3970x with a nvidia 2070 super)

    thanks

  2. Hello, I followed this tutorial step-by-step and now I get an error while trying to run StarNet++ in Pixinsight?
    I was able to run it fine beforehand, and I am certain I followed each step correctly.

    Do you have any possible explanation or fix?
    Thank you,
    Stefan

    PIXINSIGHT ERROR LOG:

    StarNet: Processing view: masterLight_BINNING_1_FILTER_Ha_EXPTIME_180_integration
    Writing swap files…
    434.667 MiB/s
    Starting star removal procedure…
    Stride: 128
    Image size: 3672×5496
    Number of channels: 1
    Color space: Grayscale
    Bits per sample: 32
    Has alpha channels: false
    Float sample: true
    Restoring neural network checkpoint…
    Done!
    Total number of tiles: 1247
    *** PCL Win32 System Exception: At address 00007FFDAF89B403 with exception code C0000005 :
    Access violation: invalid memory read operation at address 0000000000000024

    *** Backtrace Information ***
    ================================================================================
    0: in module: C:\Program Files\PixInsight\bin\PixInsight.exe at address: 0x2
    1: _C_specific_handler in module: C:\Program Files\PixInsight\bin\VCRUNTIME140.dll at address: 0x9E3EB860
    2: is_exception_typeof in module: C:\Program Files\PixInsight\bin\VCRUNTIME140.dll at address: 0x9E3E22E0
    3: is_exception_typeof in module: C:\Program Files\PixInsight\bin\VCRUNTIME140.dll at address: 0x9E3E22E0
    4: is_exception_typeof in module: C:\Program Files\PixInsight\bin\VCRUNTIME140.dll at address: 0x9E3E22E0
    5: _CxxFrameHandler3 in module: C:\Program Files\PixInsight\bin\VCRUNTIME140.dll at address: 0x9E3EC100
    6: _chkstk in module: C:\Windows\SYSTEM32\ntdll.dll at address: 0xA4FE10B0
    7: RtlRaiseException in module: C:\Windows\SYSTEM32\ntdll.dll at address: 0xA4FA9E70
    8: KiUserExceptionDispatcher in module: C:\Windows\SYSTEM32\ntdll.dll at address: 0xA4FDFE10
    9: KiUserExceptionDispatcher in module: C:\Program Files\PixInsight\bin\tensorflow.dll at address: 0xA4FDFE10
    10: KiUserExceptionDispatcher at address: 0xA4FDFE10
    11: KiUserExceptionDispatcher at address: 0xA4FDFE10
    12: KiUserExceptionDispatcher at address: 0xA4FDFE10
    13: malloc_base in module: C:\Windows\System32\ucrtbase.dll at address: 0xA1ED2560
    14: RtlFreeHeap in module: C:\Windows\SYSTEM32\ntdll.dll at address: 0xA4F7FB40
    15: free_base in module: C:\Windows\System32\ucrtbase.dll at address: 0xA1ED14B0
    ================================================================================
    Reading swap files…
    1838.365 MiB/s

    1. Hello, make sure your drivers are on the latest version, make sure you have the TF_ environment variable set, those two things are the must likely culprit.

      1. You’re correct, my NVIDEA driver was out of date! Updated and it works perfectly now.

        Thanks so much for your advice and for this amazing technique!

  3. Hi, I had it working. The next day when I tried it (after restart) was an error
    ‘Checkpoint file not found!’

    1. If you get that error it’s likely that you started Pixinsight from after an update. Type in “cd bin” in the processing console and it will work again. Or just restart Pixinsight but make sure you start it from the start menu.

  4. that worked but I don’t understand why.
    I did update a resource in PixInsight and I normally start PixInsight from a copied icon from the start menu on the desktop.

    Also, I could not type in anything from the Process Console so I’m curious how to do that.

    thanks for the help

    1. When Pixinsight launches itself from after an update it sets its work directory to “c:\program files\pixinsight” when launching it normal it sets it to “c:\program files\pixinsight\bin”. Starnet is looking for its local files in the current work directory, but the files it searches are in “bin”. If Pi launches with a different work directory starnet fails to find its working files. Typing “cd bin” in the processing console (you can type it in at the very bottom of the console, need to click in the lighter area) moves the working directory to bin and starnet then can find its files.

  5. Update:
    I just noticed the extra grey bar to type in commands.
    I typed it in (after loading a project) and I got an error (in red) saying the directory did not exist so if I’m saving files in a directory, I must have to use the full pathname which would be a pain. Do I then have to go back to my file directory with the images?

    Error from console window below:
    cd bin
    *** Error: The target directory doesn’t exist: bin

  6. thanks (and yes, already in bin as you suggest — at least I’ve learnt something tonight)
    Don’t know why I didn’t think of that after 40+ yrs of hands-on computing.
    (brain fail in PixInsight imaging mode)

  7. I’m not seeing the Cuda option in task manager either. This may be unavailable on some machines based on a quick google search

  8. It did not work for me:/ same speed. I followed the steps to the point several times. Maybe coz I’m using Windows 7. GPU is NVIDIA 650. One thing that’s different though is when I try to edit the Path, I get only one line that i can edit; not several lines like the one in your screenshot :/

    1. For one you should actually update from a not-supported system, for the other the path needs to be added with a semicolon after the others then in Windows 7. Run the standalone Starnet++ and see what errormessages you get and continue from there.

  9. This is fantastic, thank you so much!

    BTW, as others have mentioned “CUDA” is not a choice as a pull down in the GPU displays under Task Manager for me either, but selecting “Compute_0” does the job of showing GPU loading…and of course the usual all-cores-100% CPU graph is mostly idling now.

  10. Sorry, I’m not sure if this has been covered but If using stand alone starnet++ on windows 10, where do we extract the tensorflow.dll file too instead if not using pixinsight?

  11. Darkarchon:

    Please note in your directions you have “GROWTH” and “GROTH” for the environment variable.
    I assume you meant “GROWTH”…

    I cannot get this to work on my system. I DID follow all of the instructions (and paid particular attention..I am surprised no one has found your error yet.)

    I do have two GPUs. My NVIDIA is on my GPU1 … could this be an issue? I note you assume GPU0.

  12. Thank you for detailing these steps Darkarchon – what a difference in Pixinsight for Starnet++ with this configured!

  13. Absolutely amazing! Works on a Ryzen 5 2600x, EVGA NVIDIA RTX 2060 super clocked, Windows 10 64-bit. Using Starnet++ with any stride smaller than 128 typically meant going for coffee and doughnut..this piece of programming may actually save my life 🙂 Thank you very much for your work!

  14. Made a fake account (not my kind of thing) and got it working.
    Thanks for the work.
    GTX 1050 from >3 mins down to just <1min.

  15. Darkarchon

    Great instruction. I have a Ryzen 9 3900x and GTX1660 super and this is fantastic.My test image went from 46 seconds with 60 percent CPU usage to 14 seconds with 90 percent Cuda usage and 5.6out of 6 GB ram usage and the CPU idling at 6 percent. Outstanding!

  16. Darkarchon,
    Thanks for the great instructions. I don’t know how you figured it out, but once I got it right it works great.
    I don’t know what happened but somehow I ended up with Cuda v11.0 and the processing was only on the CPU.
    I went back over everything, redownloaded the files and again followed your instructions. This time I ended up with Cuda v10.0 and everything worked. Starnet ++ went from just over 2 minutes to 17 seconds, Wow!
    Thanks

  17. Hi, thanks for the great instructions but it did not work for me. I have a NVIDIA GeForce GTX 650 Ti and have gone through the installation to make sure I followed it exactly however, when I run STARNET, my CPU usage shoots up to 70% and the GPU just idles at 1%. I have checked the Environmental settings a number of times but to no avail. I have also checked the ^50 Ti driver and it says it has the latest version. If you can provide further checks that would be good.

  18. Excellent guide! I have two GPU’s but I only see one getting used in task manager.

    Is it possible to enable utilization of both?

  19. Hi,

    Thanks for putting so much work into the project! I had everything working in PixInsight 1.8.8.5, now, after the upgrade to 1.8.8.6 – even though I followed your updated instructions – StarNet will only use the CPU instead of my Quadro M620.

    I noticed a mismatch between your download links in the prerequisites and the cudnn version in step 3. I guess cudnn-10.1-windows10-x64-v8.0.2.39 is the correct version?

    Thanks,
    Sebastian

        1. Make sure to have the environment variables set and you might need to reboot your machine. If it still doesn’t work restore the original state and wait for PI to release their own GPU accelerated Tensorflow.

          1. …I restored PI to its original state. Let’s wait for the official release.
            Thanks for your help!

  20. Hi, thanks for your tutorial!
    Is there an easy way to turn Cuda on and off to compare before and after when you have everything installed and running? My best PC is a laptop with the i5-1035G1 and the NVidia GeForce MX250 which is a low-end GPU but has a specification of 364 Cuda cores. But I’m not convinced I’ve benefited from a speed up. In task manager Cuda is shown running around 95% while the CPU is still running typically around 70-80%. Just wondered if the CPU is working so hard just to handle the GPU traffic, or is it still bearing some of the original burden?

  21. For all who have same trouble setting up CUDA and PI. You have to fullfill th given Releasenumber. I run into failure during setup with CUDA Release 11.

    Best regards

    Jan

  22. Hi,
    thanks a lot for your receipt!

    I don’t see Starnet in the list of modules in PI 1.8.8-6.
    Probably, ’cause my pretty old processor (Intel Core i7 970) doesn’t support AVX instructions.
    I’ve tried to overcome it with the use of my GPU (GTX 1060) as described above but it doesn’t work. I’ve done all manipulation, re-booted, checked twice that environment variables were in place.

    Does it mean that Starnet (or PI) somehow check compatibility of CPU with AVX instructions set and if AVX isn’t supported by CPU then Starnet will not be available in PI?

    Thanks a lot for your help!

  23. Hi, I used to have this working just fine on my Geforce 1080 GT. I just upgraded to a 3080 RTX and now it no longer works. When I start the starnet process, PI just freezes up. Any ideas on how to fix? Thanks!

  24. I just followed through all the install instructions. The good news is, I didn’t break Starnet++, the bad news is, it is still not picking up my RTX 2070 for CUDA processing. I had a few questions and one possible complication:

    1) Step 3 say to copy the bin folder but the picture seems to imply both the bin and lib folders are copied over to the C:\Program Files\NVIDIA GPU Computing Toolit\CUDA\V10.1 location? I tried it both ways with no luck

    2) In this same step, does the bin folder REPLACE the existing folder (which has a lot of other stuff already in it) or does the file IN the bin folder get dropped into the existing BIN folder? I have also tried this both ways…

    My complication might be that I had a version of this toolkit 10.2 already installed from previously. Initially I left that alone and just followed these instructions. When I could not get that to work I eventually uninstalled all the 10.2 content I could find identified under my installed programs list. That did not seem to help either.

    So at this point, I seem to have everything installed following your instructions, but I am still running my CPU instead of the GPU when running starnet under the latest 1.8-6 Pixinsight release. One other point of info, my CPU is a Ryzen 3700x…

    Thanks for any advice / info you can give me to try to make this work…

    ML

    1. Hi,

      Make sure to have the environment variables set correctly for system, not the user, and download the correct cudnn. It all needs to be thrown into the same folder as cuda 10.1. in step 3 you replace the folder but all it does it add the contents of cudnn to the existing cuda folder which should give no issues.

  25. Any idea what’s needed to get this running on a RTX 3090? It worked just fine with my old 2080ti and still runs after some updating, but I don’t see any CUDA utilization in the task manager (nor is it running as quickly as it had been).

    1. Which version of the 3080. I have the MSI Trio X and I cannot for the life of me get this to work even with 8.0.5 running on 10.1. I have tried starting fresh and it still is not working. Any suggestions would be appreciated.

  26. Let me just save anyone trying to get fancy out there and use all the latest versions of CUDA and cuDNN.

    Don’t. Just download exactly what you see here. Copy n paste the settings. All will be good.

  27. I am very impressed. Santa gave me a new 8 core AMD laptop with a GTX 1660 Ti card. Doing a 32 stride on my old i5 desktop took around 3 to 4 hours. This process did it in just over four mins and only used about 80 % of the GPU. CPU barely moved about 20%.

  28. WOW! This cut my time from 20 minutes to 4. Will it speed up any other processes or just Starnet?

  29. It took me quite a while but I finally got it to work. My gains aren’t as good as most but it’s still awesome. Startnet ++ time dropped from 6:38 to 1:46. I did have some issues which I’ll post for others struggling.
    The biggest issue was finding the correct (older) version of CUDA. The link above did not seem to work for me and when I went to the nVidia developer site the latest version 11.2 was all I could find. I used this and changed folder names from 10.1 to 11.2 when appropriate but I couldn’t get it to work. I then found an archived copy of the older version here: https://developer.nvidia.com/cuda-10.1-download-archive-update2
    I redid everything and was very hopeful – it STILL didn’t work :(. I then closed Pixinsight, reopened it and VOILA, success!
    Not sure if it was my idiocy or perhaps the link above needs an update. The other two links worked fine. BTW, I’m using Pixinsight 1.8.8-7, a Ryzen 3900 CPU and an nVidia Quadro P2200 (which I just bought yesterday to replace my AMD card exclusively for this).
    Thanks so much Darkarchon! Your EZprocessing suite is a miracle for beginning processors like me and speeding things up 3-4 fold is just icing on the cake. You da man!
    Tim

  30. Works perfectly. Thank you very much!
    Processingtime drops from 1m41s to 26s.
    Hope more processes would benefit from cuda

    BR
    Thomas

  31. I have a Geoforce GTX 1050 on a Dell Laptop. After double and triple checking that I followed the instructions it still did not work. What did work was updating the Nvidea drivers. DO NOT DO THIS THROUGH WINDOWS. Windows will happily tell you that you have the best drivers for your hardware when you don’t. Instead, go directly to the Nvidea website and download the drivers for your card. I have not done any benchmarks but Starnet is WAY faster.

  32. This worked for me with my older i7-5820K system, using a GTX 970 on PixInsight 1.8.8-8.

    I just upgraded to a Ryzen 5950X and RTX 3060 Ti, and the new version 1.8.8-9 of PixInsight and these steps did not work.

    Eventually switching to the 2.4.0 version of Tensorflow did work, as in Starnet would run, and it was a bit faster (3:25 stock, 1:57 with Tensorflow 2.4.0) but it was also not really using the GPU, or at least not a lot as the CPU was still around 60%.

    I followed Supernovae’s advice and tried these steps with Cuda 11 and Cudnn 8.2.4.15 but still no better than about 1:57 when I run Starnet.

    Maybe something changed in PixInsight 1.8.8.-9?

  33. Fixed! I had previously installed Cuda 10 before trying Cuda 11. I had to uninstall Cuda 10, and update the CUDA_PATH environment Variable to:

    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0

    (note this is in addition to the CUDA_PATH_V11_0 variable with the same value)

    Also of critical importance, the value “true” for the variable TF_FORCE_GPU_ALLPW_GROWTH is case sensitive, it *must* be lowercase.

    Making those changes this now works and at 128 strides it only takes 0:25 where before it was taking 3:25.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.