UMass CTF | Reader Exercise w/ Gröbner

A bit ago, I played the UMass CTF with the CyberHero team. We performed pretty well, placing third overall, and had the opportunity to solve a lot of really cool challenges. In particular, I found RE to be especially interesting as it made me learn a bit about the Groebner basis (which actually turned out to be unintended!).


Reader Exercise

Author: ming

10 solves / 498 points

Kevin the sea cucumber became a math competition addict and developed this weird verification scheme. We could only recover this source file that’s missing a function that looks pretty important. Before retreating back into his math competition preparation cave, he said that breaking this scheme would be left as “an exercise for the readers”, so we now need your help to retrieve the flag!

The challenge implements polynomials over a field as one’d expect with the Polynomial class and the (hidden) genPair function, so I’ll leave out that part for brevity. The meat of the challenge is here:

if __name__ == "__main__":
    with open("flag.txt", "r") as f:
        FLAG = f.read()
    size = 500
    base = 16
    degree = 8
    print(f"Gimme a sec to generate the prime...")
    while True:
        n = getPrime(size)
        if n % (base * 2) == 1:
            break
    print(f"n = {n}")

    p, q = gen_pair(degree, n)

    assert isinstance(p, Polynomial) and isinstance(q, Polynomial)
    assert p.degree() == degree
    assert q.degree() < p.degree()

    p_squared = p * p
    q_squared = q * q
    while True:
        decision = input("What would you like to do?\n")
        if decision == "challenge":
            challenge = int(input("I will never fail your challenges!\n"))
            proof = (p_squared(challenge) + q_squared(challenge)) % n
            assert proof == (pow(challenge, base, n) + 1) % n
            print(f"See? {proof}")
        elif decision == "verify":
            token = getRandomNBitInteger(size - 1) % n
            print("Here's how verification works: ")
            print(f"I give you:")
            print(f"token = {token}")
            print(f"You should give back:")
            print(f"p(token) = {p(token) % n}")
            print(f"q(token) = {q(token) % n}")

            print(f"Simple enough, right?")
            token = getRandomNBitInteger(size) % n
            print(f"token = {token}")
            p_attempt = int(input("p(token) = "))
            q_attempt = int(input("q(token) = "))
            assert p_attempt == p(token) % n and q_attempt == q(token) % n
            print("Great job!")
            print(FLAG)
            break
        else:
            print("Probably not that...")

So, we’re given two polynomials, \(p(x)\) and \(q(x)\), such that \(p(x)^2 + q(x)^2 = x^{16} + 1\) where \(\text{deg}(q) < \text{deg}(p) \leq 8\) modulo a known prime of the form \(n = 32k + 1\). Our goal is, given an evaluation at a random but known point \(t_1\), find an evaluation at a random point \(t_2\). At first, I tried seeing were there any reasons for such a choice of \(n\). And I couldn’t figure it out lol - the intended idea there was to guarantee a root of unity, but this didn’t quite click for me at the time.

So, naturally, we take a pen and paper and write it all out.

Polynomials

We’re actually given a lot more immediate information than might be apparent. Simply brute forcing the polynomials isn’t efficient, but there’s a structure we can exploit. Consider the end-result polynomial \(x^{16}+1\). Obviously, one of the polynomials then had to have been of degree 8 at the least, of which that can only be \(p(x)\). It’s then easy to see that the coefficient that’s with \(x^8\) in the polynomial \(p(x)\) must equal \(1\) when squared (otherwise the coefficient in the ending polynomial would not be \(1\)!).

More formally, let us write the polynomials \(p(x)\) and \(q(x)\) as:

\[p(x) = \sum_{i=0}^{8}a_ix^i\] \[q(x) = \sum_{i=0}^{7}b_ix^i\]

Symbolically in SageMath, we write this as

R.<x> = SR[]
a = SR.var('a', 7)
b = SR.var('b', 8)
p = sum(a[k] * x^k for k in range(8)) + x^8
q = sum(b[k] * x^k for k in range(8))

As made apparent above, we know that the coefficient that goes with \(x^8\) is either \(1\) or \(-1\). What we choose doesn’t actually make a difference, so we go with the simpler choice of \(1\).

Observe the symbolic sum of the squared polynomials. Notice that the (sum of the) coefficients with each of the variables is equal to zero for most variables (excluding \(x^{16}\) and \(x^0\), whose coefficients are both \(1\)). Looking at it like this, we’ve got a system of multivariate equation modulo a prime. Like this we’ve got ~15 variables, and about as many equations. Trying it out with pen and paper, you can determine a few of the variables (for instance, \(a_7\) can be quickly determined to be \(0\)), but this gets really difficult really quick - asides from the trivial solution (that being \(p(x) = x^8\) and \(q(x)=1\)), finding anything else by hand is hard.

But can a computer do this? Turns out that yes!

Mr. Gröbner

So, over the years, I’ve consistently heard about this thing called the Gröbner basis. I knew it was something to do with nonlinear equations, but the details eluded me. Turns out, it’s actually a rather complicated topic and nobody really knows what’s going on - and the somewhat good news is that we don’t have to.

Let \(F\) be a set of multivariate polynomials over a polynomial ring \(R[x_1, \ldots, x_n]\). The ideal \(I\) generated by \(F\) is the set of linear combinations of elements of \(F\). Informally put, the Gröbner basis of the ideal \(I\) represents a “minimal” set of polynomials that generate it. In a way, one can see it as the LLL but for ideals.

The problem, though, is that this is a very slow algorithm for most cases, and we’d like \(F\) to be as large as possible in comparison to the number of variables within \(R\).

Through the lense of our problem, we can let \(I\) be the ideal generated by the ‘equations’ of our sum of squared polynomial, and apply the Gröbner basis on this ideal. The resulting ‘minimal’ generating set will be trivial to solve. We’re expecting every entry to be of the form \(a_i - c_i\) for a constant \(c_i\).

SageMath magic

SageMath has a Gröbner basis implemented (many version even - notable is Singular). Usually it’s a toy implementation, but there’s a proper one (albeit a slow one) for the multivariate case. But, again, we’d like to have as much data to feed into Sage lest we fail to solve (or it taking too long).

Prior to our own guess, we actually get two evaluations. This is just a linear combination of all the unknown coefficients, and gives us 2 more equations to throw into our solve!

# t: the point evaluated
# pt: p(t)
# qt: q(t)
# t2: the point we need to evaluate
def solve(t, pt, qt, t2):
    global polys, F # Our original equations and the field in question
	
	# We add the two aformentioned equations
    p1 = (sum(a[k] * ZZ(t)^k for k in range(8)) + ZZ(t)^8 - pt).polynomial(F)
    q1 = (sum(b[k] * ZZ(t)^k for k in range(8)) - qt).polynomial(F)
    polys += [p1, q1]
    
    I = Ideal(polys)
    print("Computing basis...") 
    g = I.groebner_basis() # Takes a bit under 2 minutes
    
    P.<z> = PolynomialRing(F)
	
	# Reconstruct the polynomials from the new basis
	ppoly = 0 
    for idx, value in enumerate(g[:7]): # We skip the last one because it's zero
        v = -list(value)[1][0]
        ppoly += z^idx * v
    ppoly += z^8
    print(ppoly)

    qpoly = 0
    for idx, value in enumerate(g[8:]):
        v = -list(value)[1][0]
        qpoly += z^idx * v
    print(qpoly)
    
    return [ppoly(t2), qpoly(t2)]   

The above works, and produces an answer within the challenge’s time-frame! It’s in fact what solved the challenge for me.

But it’s still a bit slow, and one’d get curious how to speed it up…

Further reading

So, what do? Looking around, there’s some information we can additionally supply that could’ve helped. The main improvement is that we know that most of the coefficients are units i.e. not zero, as the trivial solution is the only case when multiple are equal to zero - so adding more equations of the form \(c_i * a_i - 1\) would help convince Gröbner to avoid that. I didn’t actually try that, as the above worked, but it was something I was planning on trying.

Another option was simply not using SageMath to compute the Gröbner basis. The big problem with Sage, beyond making C/C++ calls from Python, is the lack of parallelization and threading in Singular’s implementation, making the computation rather single-core dependent and slow. There exist a few faster algos than Singular within SageMath, but most are limited to smaller working over smaller fields (ex. Macaulay2). The ones we’d ideally need are locked behind pricey software (Maple, Magma), but those offer trials, and it might’ve been worth giving them a shot for the sake of the challenge.

TL;DR: Gröbner is pretty much the LLL for ideals, it’s a pretty fun thing, and can help you solve multivariate problems much quicker. Just make sure to feed it enough equations and computing power