Do1e

Do1e

github
email

Nanjing University Unified Identity Authentication Verification Code Recognition: A Complete Open Source Practice from Dataset Construction to Model Deployment

This article is synchronized and updated to xLog by Mix Space
For the best browsing experience, it is recommended to visit the original link
https://www.do1e.cn/posts/code/nju-captcha


Introduction#

In the previously written NJUlogin, the account password login required CAPTCHA recognition, and I used ddddocr at that time, achieving good accuracy.

I also deployed a server and asked a friend to help write a Tampermonkey script, allowing me to automatically fill in the CAPTCHA every time I needed to log in (the account password is automatically filled by the browser), so I only needed to click login.

However, recently I thought about making the recognition model lighter for easier deployment on edge devices, leading to this project. (Could you give me a Star? >︿< If you just want to use it and don’t want to learn about the related technology, just scroll to the end, I recommend the NJU server API version.)

Implementation Effect

Data Collection#

https://github.com/Do1e/NJUcaptcha/tree/main/build_dataset

The dataset construction is mostly automated, relying mainly on the following two tools:

  • ddddocr: for preliminary CAPTCHA recognition
  • NJUlogin: for verifying the correctness of recognition results

I slightly modified NJUlogin to determine whether the recognition was correct, and then saved them in different folders. For the incorrectly recognized ones (about a few hundred?), I just needed to rename them manually.
To collect 100,000 images, I ran it in the background for about 3-4 days, and time.sleep couldn't be too small, otherwise, the IP would get blocked. >︿<

Thus, this dataset was created, welcome to download and use, containing 100,000 CAPTCHA images, with file naming format {CAPTCHA text}_{image md5}.jpg, and all CAPTCHA texts are in lowercase.
Dataset download link: NJU-captcha-dataset.7z
Decompression password: @Do1e

The dataset is as follows:

https://github.com/Do1e/NJUcaptcha/blob/main/model/dataset.py

Recognition Model#

https://github.com/Do1e/NJUcaptcha/tree/main/model

With the data ready, I could design the model and train it. This time, I completely handed over the model design to AI, and the results were quite okay.

Model size 12.98MiB -> 2.25MiB
Model accuracy 99.37% -> 99.83%
Throughput 173.95 images/sec -> 1076.56 images/sec [AMD Ryzen 7 8845H]

https://github.com/Do1e/NJUcaptcha/blob/main/model/model.py

Maybe it can be a bit smaller? Let’s save that for the next upgrade

Server Deployment#

https://github.com/Do1e/NJUcaptcha/tree/main/service

Previously, I also implemented a simple recognition server using fastapi, which recognizes the received base64 images and returns the CAPTCHA content. This time, I took the opportunity to deploy it on vercel. Test command on Linux:

curl -s -L "https://authserver.nju.edu.cn/authserver/captcha.html" -o "captcha.jpg" && [ -f "captcha.jpg" ] && curl -s -X POST -H "Content-Type: application/x-www-form-urlencoded" -d "captcha=$(base64 -i captcha.jpg | tr -d '\n')" "https://njucaptcha.vercel.app" || { echo "Failed to download captcha image"; exit 1; }

Tampermonkey Script Auto-Fill#

As mentioned in the introduction, to achieve automatic CAPTCHA recognition and input during login, I wrote a Tampermonkey script for auto-filling. The previous version was based on the server:

https://github.com/Do1e/NJUcaptcha/blob/main/njucaptcha.user.js

The open-source code still uses the vercel service, which is very slow, and it cannot be used when logging into p.nju. ( ̄﹃ ̄)

My own solution is to set up a service on campus and map it to my public server through frp, and access the internal service when logging into p.nju:

const url_pub = 'https://example.com/';
const url_nju = 'https://nju.example.com/';
const currentUrl = window.location.href;
const serverUrl = currentUrl.includes('//p.nju.edu.cn') ? url_nju : url_pub;

This time, the most challenging part of the entire project was how to directly execute ONNX inference on the client side. I spent several hours using AI tools to successfully figure it out. Implemented using ONNX Runtime Web.

https://github.com/Do1e/NJUcaptcha/blob/main/njucaptcha_onnx.user.js

One downside of the ONNX version is that it requires internet access to download some necessary inference dependencies when there is no cache, but it can still cache after the first use (ort-wasm-simd-threaded.jsep.mjs and ort-wasm-simd-threaded.jsep.wasm can only cache for 7 days, which is not too long. If someone has a way to achieve almost permanent caching like @resource, please submit a PR).

In summary, both solutions have their pros and cons. The most recommended is still to deploy and use it yourself as I did, or directly use the NJU server API version I provided at the end.

The above versions of the Tampermonkey script can be installed directly by clicking the links below (provided you have the Tampermonkey extension installed):

| | vercel api version | NJU server api version | onnx local inference version |
| :--- | :--- | :--- |
| Advantages | No need for scientific internet access | Best practice, personally considered quite perfect | Very fast, filled in before the page loads completely, and can be used when logging into p.nju (with caching) |
| Disadvantages | Very slow, and cannot be used when logging into p.nju | Requires deployment on an internal and external server, I will not be able to use it after graduation | Requires scientific internet access to cache some files without cache, cannot be used when logging into p.nju, and caching lasts only 7 days |


Note: The code this time uses the GPL-3.0 open-source license, please ignore the following explanation about the open-source license. I’m too lazy to change the webpage code, my website has the final say, right?

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.