Azure HDInsight documentation
Azure HDInsight is a managed Apache Hadoop service that lets you run Apache Spark, Apache Hive, Apache Kafka, Apache HBase, and more in the cloud.
- Azure HDInsight documentation
- HDInsight service
- Overview
- Tutorials
- Samples
- Concepts
- Versioning
- Enterprise Security Package
- High availability components
- Apache Ambari in Azure HDInsight
- Streaming at scale in HDInsight
- Apache Hadoop architecture in HDInsight
- HDInsight supported VM sizes
- Select the right VM size
- Cluster capacity planning
- Cluster management best practices
- Reliability and Business continuity
- Understand managed identities
- MSI Support to access Azure services
- Compare storage options
- How-to guides
- Create clusters
- Use cluster storage
- Extend clusters
- Secure
- Migrate
- Manage
- Manage clusters using the Apache Ambari web UI
- Disable auto logout from Ambari Web UI
- Optimize with Apache Ambari
- Find host names of cluster nodes
- Reboot cluster nodes
- Use Apache Ambari REST API
- Delete a cluster
- Manually scale a cluster
- Autoscale
- Use external metadata stores
- Custom Ambari DB
- Manage logs for a HDInsight cluster
- Add storage accounts
- Update storage account access key
- Upload data for Apache Hadoop jobs
- Multiple HDInsight clusters with Data Lake Storage
- Import and export data with Apache Sqoop
- Operationalize a data analytics pipeline
- Use Apache Oozie for workflows
- Cluster and service ports and URIs
- Upgrade HDInsight cluster to newer version
- OS patching for HDInsight cluster
- Use HDInsight tools
- Monitoring
- Monitor Azure HDInsight
- Use Azure Monitor Agent
- Use Azure Monitor logs
- Use queries with Azure Monitor logs
- Monitor cluster performance
- Cluster availability - Apache Ambari
- Cluster availability - Azure Monitor logs
- Monitoring data reference
- Troubleshoot
- Reference
- Resources
- Releases
- Frequently asked questions FAQ
- Information about using HDInsight on Linux
- Apache Hadoop memory and performance
- Access Apache Hadoop YARN application logs on Linux
- Enable heap dumps for Apache Hadoop services
- Get help on the Microsoft Q&A question page
- Preview features
- Pricing calculator
- Windows tools for HDInsight
- Apache Spark
- Overview
- Quickstarts
- Tutorials
- Concepts
- How-to guides
- Use tools
- Develop
- Optimize queries with SparkCruise
- Use an interactive Apache Spark Shell
- Remote jobs with Apache Livy
- Debug Apache Spark jobs remotely with IntelliJ through VPN
- Apache Spark streaming
- Apache Spark and Machine Learning
- Analyze big data
- Manage
- Manage dependencies
- Manage cluster resources
- Manage Apache Spark applications using extended History Server
- Enable caching with IO Cache
- Use notebooks with Apache Spark
- Use with other Azure services
- Troubleshoot
- Can't create Jupyter Notebook
- OutOfMemoryError exception
- Apache Spark job fails - NoClassDefFoundError
- Apache Spark job fails - InvalidClassException
- Slow Apache Spark jobs - storage container
- IllegalArgumentException exception
- Apache Spark Streaming application stops
- RpcTimeoutException exception
- Blocking Cross Origin API
- Kryo serialization failed
- Apache Spark event log - RequestBodyTooLarge
- Debug Apache Spark jobs
- Debug WASB file operations
- Use IntelliJ to debug Apache Spark job
- Apache Spark troubleshooting
- Known issues
- Apache Hadoop
- Overview
- Quickstarts
- Tutorials
- How-to guides
- Use tools
- Develop
- Use MapReduce with Apache Hadoop
- Use Apache Hive as an extract, transform, and load ETL tool
- Extract, transform, and load at scale
- Create non-interactive authentication .NET HDInsight applications
- Analyze big data
- Manage
- Troubleshoot
- Common issues
- Cluster creation failures
- Out of disk space
- Soft lockup - CPU
- InvalidNetworkConfigurationErrorCode - cluster creation fails
- InvalidNetworkSecurityGroupSecurityRules - cluster creation fails
- Unable to log into HDInsight cluster
- Unable to add nodes to HDInsight cluster
- Converting service principal certificate
- Local HDFS stuck in safe mode
- Apache Ambari heartbeat issues
- Apache Ambari UI 502 error
- Apache Ambari shows down hosts and services
- Apache Ambari stale alerts
- Apache Ambari directory alerts
- Troubleshoot a slow or failing HDInsight cluster
- Apache Hadoop HDFS troubleshooting
- Apache Hadoop YARN troubleshooting
- Invalid BCFile error from Yarn log
- Troubleshoot Data Lake Store files
- Port conflict when starting services
- Lost Key Vault access
- WASBS storage exception
- Manage disk space
- Apache Kafka
- Overview
- Quickstarts
- Tutorials
- Concepts
- How-to guides
- Develop
- Manage
- Use Apache Kafka in a virtual network
- Replicate Apache Kafka data
- Use MirrorMaker2 to replicate Apache Kafka topics with Kafka
- Analyze Apache Kafka logs
- Secure Spark and Kafka streaming integration scenario
- SSL Encryption and Authentication for Non ESP Kafka cluster
- SSL Encryption and Authentication for ESP Kafka cluster
- Kafka MirrorMaker 2.0 guide
- Connect HDInsight Kafka cluster with client VM in different VNet
- Troubleshoot
- Apache HBase
- Overview
- Quickstarts
- Tutorials
- Concepts
- How-to guides
- Develop
- Manage
- Troubleshoot
- Troubleshoot Apache HBase performance issues
- hbase hbck returns inconsistencies
- Storage exception after connection reset
- No data in Apache Phoenix views
- BindException - Address in use
- Apache HBase fails to start
- Issues with region servers
- Apache Phoenix connectivity issues
- Apache HBase REST not responding
- Pegged CPU on region server
- Timeouts with 'hbase hbck' command
- REST API to query Apache HBase
- Troubleshoot data retention issues
- Interactive Query
- Overview
- Quickstarts
- Tutorials
- Concepts
- How-to guides
- Develop
- Process and analyze JSON documents
- Use C# user-defined functions
- Use Python with Apache Hive and Apache Pig
- HWC integration with Apache Spark and Apache Hive
- HWC and Apache Spark operations
- HWC integration with Apache Zeppelin
- HWC 1.0 Supported APIs
- HWC 2.0 Supported APIs
- Apache Hive with Hadoop
- Use the Apache Hive View
- Connect to Apache Beeline
- Use Apache Hive Beeline
- Use Grafana
- Use REST API
- Use Azure PowerShell
- Use SDK for .NET
- Use the HDInsight tools for Visual Studio
- Use a Java UDF with Apache Hive
- Query Hive with PowerShell and ODBC
- Enable Hive LLAP Workload Management feature
- Manage
- Troubleshoot
- Apache Hive settings fix Out of Memory error
- Apache Tez application hangs
- Apache Hive LLAP sizing guidelines
- Apache Hive LLAP query performance
- Slow reducer
- Apache Hive gateway timeout
- Apache Hive View time-out
- Permission error creating table
- GC overhead limit exceeded
- Apache Ambari Tez View loads slowly
- Error message not shown in Apache Hive View
- Inaccessible Apache Hive View
- ZooKeeperHiveClientException - HiveServer2 configs
- Apache HIVE troubleshooting
- Apache Hive logs taking up entire disk space on Head node
- Security options for Hive
- Apache Hive LLAP Workload Management
- Develop
- Reference
- Enterprise readiness
- Overview
- Tutorials
- Concepts
- Security baseline
- Enterprise security general guidelines
- Plan for ESP clusters
- HDInsight virtual network architecture
- Restrict public connectivity
- Enable private link on a HDInsight cluster
- Enable private link on a HDInsight Kafka Rest Proxy cluster
- Transport layer security
- Plan VNETs for HDInsight
- Control network traffic
- Required IP Addresses for NSGs and UDRs
- Customer-managed key disk encryption
- LDAP sync in Ranger and Apache Ambari
- Double encryption in transit
- Use availability zones
- How-to guides
- Use ID Broker for credential management
- Create ESP clusters and sync with on-premises
- Create VNETs for HDInsight
- Connect HDInsight with on-premises network
- Configure ESP clusters using Microsoft Entra Domain Services
- Synchronize Microsoft Entra users to a HDInsight cluster
- Manage clusters with enterprise security
- Manage SSH access
- Securing data
- Use firewall to restrict outbound traffic
- Create service endpoint policies
- Configure network virtual appliance
- Service tags for Azure firewall
- Troubleshoot
- Azure Synapse integration