This paper investigates a new model to improve the scalability of low-power long-range (LoRa) networks by allowing multiple end devices (EDs) to simultaneously communicate with multiple multi-antenna gateways on the same frequency band and using the same spreading factor. The maximum likelihood (ML) decision rule is first derived for non-coherent detection of information bits transmitted by multiple devices. To overcome the high complexity of the ML detection, we propose a sub-optimal two-stage detection algorithm to balance the computational complexity and error performance. In the first stage, we identify transmit chirps (without knowing which EDs transmit them). In the second stage, we determine the EDs that transmit the specific chirps identified from the first stage. To improve the detection performance in the second stage, we also optimize the transmit powers of EDs to minimize the similarity, measured by the Jaccard coefficient, between the received powers of any pair of EDs. As the power control optimization problem is non-convex, we use concepts from successive convex approximation to transform it to an approximate convex optimization problem that can be solved iteratively and guaranteed to reach a sub-optimal solution. Simulation results demonstrate and justify the tradeoff between transmit power penalties and network scalability of the proposed LoRa network model. In particular, by allowing concurrent transmission of 2 or 3 EDs, the uplink capacity of the proposed network can be doubled or tripled over that of a conventional LoRa network, albeit at the expense of additional 3.0 or 4.7 dB transmit power.