Musical expression can be found in nearly every culture throughout history, taking on an impressive variety of forms. The extensive range of instruments, the wide range of content, and the long history of music provides an insight into both the systematic nature and the high degree of complexity of musical expression. Indeed, it has been argued that music – like natural languages – follows the rules of semiotics. Understanding how to computationally learn and generate music can provide crucial insights into machine creativity and pattern recognition, while potentially revealing new perspectives on human musical cognition. Current approaches to music generation predominantly rely on deep learning architectures, which, while powerful, come with significant drawbacks. These methods require massive labelled datasets – which largely do not exist and often raising copyright concerns -- and demand substantial computational resources both for training and inference. Moreover, their black-box nature can limit creative control and understanding of the generation process. Most importantly, despite their complexity, these models often struggle to capture the nuanced interplay between different musical parameters while maintaining coherent long-term structure especially while still retaining some degree of novelty in the generated pieces. This limitation presents an opportunity to revisit and extend simpler, more constrained approaches. In this talk, we examine several extentions to Markov Chains for music learning and generation. This approach allows us to maintain algorithmic transparency and computational efficiency while pushing the boundaries of what these methods can achieve. Traditional Markov chain approaches to music generation face two key limitations: they struggle to handle multiple musical parameters simultaneously, and higher-order chains become exponentially memory-intensive. Here, we explore the promise of decomposing the problem into multiple first-order matrices rather and then combine their predictions, as well as the efficacy of using discounted higher-order chains both within and between the musical parameters. This approach dramatically reduces memory requirements while potentially capturing more sophisticated musical relationships through the interaction between different parameter matrices.