The paper proposes an approach to modeling users of large Web sites based on combining different data sources: access logs and content of the accessed pages are combined with semantic information about the Web pages, the users and the accesses of the users to the Web site. The assumption is that we are dealing with a large Web site providing content to a large number of users accessing the site. The proposed approach represents each user by a set of features derived from the different data sources, where some feature values may be missing for some users. It further enables user modeling based on the provided characteristics of the targeted user subset. The approach is evaluated on real-world data where we compare performance of the automatic assignment of a user to a predefined user segment when different data sources are used to represent the users.