shape-warrior-t

joined 1 year ago
 

For some time now, I've been thinking about the concept of interactively manipulating mathematical expressions and equations via software. Like doing some quick algebra in Notepad or similar, except there's no potential for arithmetic/algebra errors, typos, etc. ruining any results.

At the same time, I also wanted to experiment a bit with zippers from functional programming. You need some way of specifying what (sub)expression to perform operations on, and it seemed like this kind of data structure could help with that.

And so, I made AlgeZip, a small proof-of-concept of the whole general idea. Although this polished Python version was completed only a few days ago, there were various other versions before this one in different languages and with worse-quality code. Instructions for things are on GitHub; requires Python 3.12 to run.

For simplicity, I decided to use boolean expressions instead of generic numeric algebraic expressions/equations, and only decided to include the minimum in terms of commands and functionality. From my understanding, it should be possible to transform any boolean expression into any other boolean expression in AlgeZip (without using the r! command except to set things up), though I could be wrong.

Thoughts, comments, and criticism on the idea as a whole, the program, or the source code are welcome, though I'm not sure if I'll be making any changes at this time.

[–] shape-warrior-t@kbin.social 1 points 1 year ago (1 children)

My implementation is memoized by functools.cache, but that is a concern when it comes to recursive Fibonacci. That, and stack overflows, which are also a problem for my code (but, again, not for "reasonable" inputs -- fibonacci(94) already exceeds 2^64).

Time complexity-wise, I was more thinking about the case where the numbers get so big that addition, multiplication, etc. can no longer be modelled as taking constant time. Especially if math.prod and enumerate are implemented in ways that are less efficient for huge integers (I haven't thoroughly checked, and I'm not planning to).

[–] shape-warrior-t@kbin.social 1 points 1 year ago (3 children)

Given an input c, outputs the number of distinct lists of strings lst such that:

  1. ''.join(lst) == c
  2. for every string s in lst, s consists of an arbitrary character followed by one or more characters from '0123456789'

Sure hope I didn't mess this up, because I think the fundamental idea is quite elegant! Should run successfully for all "reasonable" inputs (as in, the numeric output fits in a uint64 and the input isn't ridiculously long). Fundamental algorithm is O(n) if you assume all arithmetic is O(1). (If not, then I don't know what the time complexity is, and I don't feel like working it out.)

from functools import cache
from itertools import pairwise
from math import prod

@cache
def fibonacci(n: int) -> int:
    if n == 0:
        return 0
    if n == 1:
        return 1
    return fibonacci(n - 1) + fibonacci(n - 2)

def main(compressed: str) -> int:
    is_fragment_start = [i == 0 or c not in '0123456789' for i, c in enumerate(compressed)]
    fragment_start_positions = [i for i, s in enumerate(is_fragment_start) if s]
    fragment_lengths = [stop - start for start, stop in pairwise(fragment_start_positions + [len(compressed)])]
    return prod(fibonacci(fragment_length - 1) for fragment_length in fragment_lengths)

if __name__ == '__main__':
    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('compressed')
    print(main(parser.parse_args().compressed))

Idea: a00010 -> [a000, 10] -> [length 4, length 2] -> F(4) * F(2)
01a102b0305 -> [01, a102, b0305] -> [length 2, length 4, length 5] -> F(2) * F(4) * F(5)
where F(n) = fibonacci(n - 1) is the number of ways to partition a string of length n into a list of strings of length ≥2.

F(2) = 1 = fibonacci(1), F(3) = 1 = fibonacci(2), and F(n) = F(n - 2) + F(n - 1), so F is indeed just an offset version of the Fibonacci sequence.
To see why F(n) = F(n - 2) + F(n - 1), here are the ways to split up 'abcde': ['ab'] + (split up 'cde'), ['abc'] + (split up 'de'), and ['abcde'], corresponding to F(5) = F(3) + F(2) + 1.
And the ways to split up 'abcdef': ['ab'] + (split up 'cdef'), ['abc'] + (split up 'def'), ['abcd'] + (split up 'ef'), and ['abcdef'], corresponding to F(6) = F(4) + F(3) + F(2) + 1 = F(4) + F(5) = F(6 - 2) + F(6 - 1).
The same logic generalizes to all n >= 4.

[–] shape-warrior-t@kbin.social 1 points 1 year ago

So every list of strings, where each string is some character followed by one or more digits, is a distinct, valid decompressing option. Thanks for clarifying!

[–] shape-warrior-t@kbin.social 2 points 1 year ago (2 children)

Thanks for the update on checking through solutions, and thanks in general for all the work you've put into this community!

Would just like to clarify: what are the valid decompressed strings? For an input of a333a3, should we return 2 (either a333 a3 or a3 33 a3) or 1 (since a333 a3 isn't a possible compression -- it would be a336 instead)? Do we have to handle cases like a00010, and if so, how?

[–] shape-warrior-t@kbin.social 2 points 1 year ago* (last edited 1 year ago) (1 children)

My solution (runs in O(n) time, but so do all the other solutions so far as far as I can tell):

from itertools import pairwise

def main(s: str) -> str:
    characters = [None] + list(s) + [None]
    transitions = []
    for (_, left), (right_idx, right) in pairwise(enumerate(characters)):
        if left != right:
            transitions.append((right_idx, right))
    repeats = [(stop - start, char) for (start, char), (stop, _) in pairwise(transitions)]
    return ''.join(f'{char}{length}' for length, char in repeats)

if __name__ == '__main__':
    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('s')
    print(main(parser.parse_args().s))

Runthrough:
'aaabb' -> [None, 'a', 'a', 'a', 'b', 'b', None] -> [(1, 'a'), (4, 'b'), (6, None)] -> [(4 - 1, 'a'), (6 - 4, 'b')]

Golfed (just for fun, not a submission):

import sys
from itertools import pairwise as p
print(''.join(c+str(b-a)for(a,c),(b,_)in p([(i,r)for i,(l,r)in enumerate(p([None,*sys.argv[1],None]))if l!=r])))

[–] shape-warrior-t@kbin.social 1 points 1 year ago (1 children)

I actually found this challenge to be easier than this week's medium challenge. (Watch me say that and get this wrong while also getting the medium one correct...) Here's an O(n) solution:

bracket_pairs = {('(', ')'), ('[', ']'), ('{', '}')}

def main(brackets: str) -> str:
    n = len(brackets)
    has_match_at = {i: False for i in range(-1, n + 1)}
    acc = []
    for i, bracket in enumerate(brackets):
        acc.append((i, bracket))
        if len(acc) >= 2:
            opening_idx, opening = acc[-2]
            closing_idx, closing = acc[-1]
            if (opening, closing) in bracket_pairs:
                acc.pop(), acc.pop()
                has_match_at[opening_idx] = has_match_at[closing_idx] = True
    longest_start, longest_end = 0, 0
    most_recent_start = None
    for left_idx, right_idx in zip(range(-1, n), range(0, n + 1)):
        has_match_left = has_match_at[left_idx]
        has_match_right = has_match_at[right_idx]
        if (has_match_left, has_match_right) == (False, True):
            most_recent_start = right_idx
        if (has_match_left, has_match_right) == (True, False):
            most_recent_end = right_idx
            if most_recent_end - most_recent_start > longest_end - longest_start:
                longest_start, longest_end = most_recent_start, most_recent_end
    return brackets[longest_start:longest_end]

if __name__ == '__main__':
    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('brackets')
    print(main(parser.parse_args().brackets))

We start off by doing the same thing as this week's easy challenge, except we keep track of the indices of all of the matched brackets that we remove (opening or closing). We then identify the longest stretch of consecutive removed-bracket indices, and use that information to slice into the input to get the output.

For ease of implementation of the second part, I modelled the removed-bracket indices with a dict simulating a list indexed by [-1 .. n + 1), with the values indicating whether the index corresponds to a matched bracket. The extra elements on both ends are always set to False. For example, {([])()[(])}()] -> FFTTTTTTFFFFFTTFF, and ([{}]) -> FTTTTTTF. To identify stretches of consecutive indices, we can simply watch for when the value switches from False to True (start of a stretch), and from True to False (end of a stretch). We do that by pairwise-looping through the dict-list, looking for 'FT' and 'TF'.

[–] shape-warrior-t@kbin.social 1 points 1 year ago (1 children)

Interesting approach, but from my understanding of the code, it doesn't generate matches like [()][()]. I could be wrong, but I don't see how you can get that by prepending, appending, and enclosing just (), [], and/or {}.

I'm also assuming that [()][()] is supposed to be one of the results for n = 8. At least two others here seem to have made that assumption, and I believe it's consistent with the previous challenge. Would be nice to have some clarification on this, though.

[–] shape-warrior-t@kbin.social 2 points 1 year ago* (last edited 1 year ago)

Here's my Python solution. If my reasoning is all correct, it should (in theory) run in O(n * (number of matches)), which should be optimal since it takes at least that long to print out the results anyway. IF my reasoning is all correct, and my program is all correct.

closing_to_opening = {')': '(', ']': '[', '}': '{'}

def main(n: int) -> list[str]:
    assert n % 2 == 0, f'n must be even; got {n=}'
    acc: list = [(n // 2, None, None)]
    for i in range(n):
        new_acc = []
        for closing_brackets_left, unmatched_series, full_series in acc:
            if unmatched_series is not None:
                most_recent_closing_bracket, rest = unmatched_series
                matching_opening_bracket = closing_to_opening[most_recent_closing_bracket]
                new_acc.append((closing_brackets_left, rest, (matching_opening_bracket, full_series)))
            if closing_brackets_left > 0:
                for bracket in [')', ']', '}']:
                    new_acc.append((closing_brackets_left - 1, (bracket, unmatched_series), (bracket, full_series)))
        acc = new_acc
    result = []
    for _, _, series in acc:
        series_as_list = []
        for _ in range(n):
            bracket, series = series
            series_as_list.append(bracket)
        result.append(''.join(series_as_list))
    return result

if __name__ == '__main__':
    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('n')
    result = main(int(parser.parse_args().n))
    print('\n'.join(result))

Idea: The matches of size n are precisely the strings with n/2 closing brackets, n/2 opening brackets, and brackets arranged so that each closing bracket matches up with the opening bracket on the "top of the stack" when processing the string and removing matches. We build the strings backwards. For each match-in-construction, we track the number of closing brackets left to be added, the "stack" (but working backwards, so the roles of opening and closing brackets are reversed), and, of course, the actual string. We transform each match-in-construction into 1, 3, or 4 new matches-in-construction, adding one character at a time: the opening bracket that matches the closing bracket on the top of the stack (if any), and the three closing brackets (if we still have closing brackets to add). Because appending to strings is O(n) since we need to copy, and pushing and popping from Python lists creates mutable aliasing issues (and would take O(n) to copy, just like with strings), we do a Lisp and use cons cells to create lists instead (hence, the backwards building). I suspect it gives the same asymptotic runtime anyway, but I don't actually know whether that's true.

[–] shape-warrior-t@kbin.social 6 points 1 year ago* (last edited 1 year ago) (1 children)

Here's an O(n) solution using a stack instead of repeated search & replace:

closing_to_opening = {')': '(', ']': '[', '}': '{'}
brackets = input()
acc = []
for bracket in brackets:
    if bracket in closing_to_opening:
        if acc and acc[-1] == closing_to_opening[bracket]:
            acc.pop()
        else:
            acc.append(bracket)
    else:
        acc.append(bracket)
print(''.join(acc))

Haven't thoroughly thought the problem through (so I'm not 100% confident in the correctness of the solution), but the general intuition here is that pairs of brackets can only match up if they only have other matching pairs of brackets between them. You can deal with matching pairs of brackets on the fly simply by removing them, so there's actually no need for backtracking.

Golfed, just for fun:

a=[]
[a.pop()if a and a[-1]==dict(zip(')]}','([{')).get(b)else a.append(b)for b in input()]
print(''.join(a))