Interesting, if it works.
Try coupling this array's output to a separate neural network trained to interpret its input in terms of "What am I looking at?" Then feed that result into a much bigger NN trained (like GPT-4) to express in words what it's looking at. Then stick this stuff into an android body....