Today, compute GPUs have become a primary enabler to accelerate a wide range of workloads. Given the rampant growth of data set sizes and computing demands, programmers have quickly reached the limits of memory and compute resources on a single or multiple GPUs managed by a single host. This talk will suggest some new directions for researchers to pursue to move beyond the resources of a few GPUs, while also reducing the programming burden on application developers.