public class SlotPoolImpl extends Object implements SlotPool, SlotPoolService
ExecutionGraph. It will attempt to acquire
new slots from the ResourceManager when it cannot serve a slot request. If no ResourceManager is
currently available, or it gets a decline from the ResourceManager, or a request times out, it
fails the slot request. The slot pool also holds all the slots that were offered to it and
accepted, and can thus provides registered free slots even if the ResourceManager is down. The
slots will only be released when they are useless, e.g. when the job is fully running but we
still have some free slots.
All the allocation or the slot offering will be identified by self generated AllocationID, we will use it to eliminate ambiguities.
TODO : Make pending requests location preference aware TODO : Make pass location preferences to ResourceManager when sending a slot request
| Modifier and Type | Class and Description |
|---|---|
protected static class |
SlotPoolImpl.AvailableSlots
Organize all available slots from different points of view.
|
protected static class |
SlotPoolImpl.PendingRequest
A pending request for a slot.
|
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
batchSlotRequestTimeoutCheckEnabled |
protected org.slf4j.Logger |
log |
| Constructor and Description |
|---|
SlotPoolImpl(org.apache.flink.api.common.JobID jobId,
org.apache.flink.util.clock.Clock clock,
org.apache.flink.api.common.time.Time rpcTimeout,
org.apache.flink.api.common.time.Time idleSlotTimeout,
org.apache.flink.api.common.time.Time batchSlotTimeout) |
| Modifier and Type | Method and Description |
|---|---|
Optional<PhysicalSlot> |
allocateAvailableSlot(SlotRequestId slotRequestId,
AllocationID allocationID,
ResourceProfile requirementProfile)
Allocates the available slot with the given allocation id under the given request id for the
given requirement profile.
|
protected void |
checkBatchSlotTimeout() |
protected void |
checkIdleSlot()
Check the available slots, release the slot that is idle for a long time.
|
void |
close()
Close the slot pool service.
|
void |
connectToResourceManager(ResourceManagerGateway resourceManagerGateway)
Connects the SlotPool to the given ResourceManager.
|
AllocatedSlotReport |
createAllocatedSlotReport(ResourceID taskManagerId)
Create report about the allocated slots belonging to the specified task manager.
|
void |
disableBatchSlotRequestTimeoutCheck()
Disables batch slot request timeout check.
|
void |
disconnectResourceManager()
Disconnects the slot pool from its current Resource Manager.
|
Optional<ResourceID> |
failAllocation(AllocationID allocationID,
Exception cause)
Fail the specified allocation and release the corresponding slot if we have one.
|
Optional<ResourceID> |
failAllocation(ResourceID resourceId,
AllocationID allocationID,
Exception cause)
Fails the allocation with the given allocationId.
|
Collection<SlotInfo> |
getAllocatedSlotsInformation()
Returns a list of
SlotInfo objects about all slots that are currently allocated in
the slot pool. |
Collection<SlotInfoWithUtilization> |
getAvailableSlotsInformation()
Returns a list of
SlotInfoWithUtilization objects about all slots that are currently
available in the slot pool. |
DualKeyLinkedMap<SlotRequestId,AllocationID,SlotPoolImpl.PendingRequest> |
getPendingRequests() |
Collection<SlotOffer> |
offerSlots(TaskManagerLocation taskManagerLocation,
TaskManagerGateway taskManagerGateway,
Collection<SlotOffer> offers)
Offers multiple slots to the
SlotPool. |
boolean |
registerTaskManager(ResourceID resourceID)
Register TaskManager to this pool, only those slots come from registered TaskManager will be
considered valid.
|
void |
releaseSlot(SlotRequestId slotRequestId,
Throwable cause)
Releases the slot with the given
SlotRequestId. |
boolean |
releaseTaskManager(ResourceID resourceId,
Exception cause)
Unregister TaskManager from this pool, all the related slots will be released and tasks be
canceled.
|
CompletableFuture<PhysicalSlot> |
requestNewAllocatedBatchSlot(SlotRequestId slotRequestId,
ResourceProfile resourceProfile)
Requests the allocation of a new batch slot from the resource manager.
|
CompletableFuture<PhysicalSlot> |
requestNewAllocatedSlot(SlotRequestId slotRequestId,
ResourceProfile resourceProfile,
org.apache.flink.api.common.time.Time timeout)
Request the allocation of a new slot from the resource manager.
|
protected void |
runAsync(Runnable runnable)
Execute the runnable in the main thread of the underlying RPC endpoint.
|
protected void |
scheduleRunAsync(Runnable runnable,
long delay,
TimeUnit unit)
Execute the runnable in the main thread of the underlying RPC endpoint, with a delay of the
given number of milliseconds.
|
protected void |
scheduleRunAsync(Runnable runnable,
org.apache.flink.api.common.time.Time delay)
Execute the runnable in the main thread of the underlying RPC endpoint, with a delay of the
given number of milliseconds.
|
void |
start(JobMasterId jobMasterId,
String newJobManagerAddress,
ComponentMainThreadExecutor componentMainThreadExecutor)
Start the slot pool to accept RPC calls.
|
protected void |
timeoutPendingSlotRequest(SlotRequestId slotRequestId) |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitcastInto, notifyNotEnoughResourcesAvailableprotected final org.slf4j.Logger log
protected boolean batchSlotRequestTimeoutCheckEnabled
public SlotPoolImpl(org.apache.flink.api.common.JobID jobId,
org.apache.flink.util.clock.Clock clock,
org.apache.flink.api.common.time.Time rpcTimeout,
org.apache.flink.api.common.time.Time idleSlotTimeout,
org.apache.flink.api.common.time.Time batchSlotTimeout)
public Collection<SlotInfo> getAllocatedSlotsInformation()
SlotPoolSlotInfo objects about all slots that are currently allocated in
the slot pool.getAllocatedSlotsInformation in interface SlotPoolSlotInfo objects about all slots that are currently allocated in
the slot pool.@VisibleForTesting public DualKeyLinkedMap<SlotRequestId,AllocationID,SlotPoolImpl.PendingRequest> getPendingRequests()
public void start(@Nonnull JobMasterId jobMasterId, @Nonnull String newJobManagerAddress, @Nonnull ComponentMainThreadExecutor componentMainThreadExecutor) throws Exception
start in interface SlotPoolstart in interface SlotPoolServicejobMasterId - The necessary leader id for running the job.newJobManagerAddress - for the slot requests which are sent to the resource managercomponentMainThreadExecutor - The main thread executor for the job master's main thread.Exception - if the the service cannot be startedpublic void close()
SlotPoolServiceclose in interface AutoCloseableclose in interface SlotPoolclose in interface SlotPoolServicepublic void connectToResourceManager(@Nonnull ResourceManagerGateway resourceManagerGateway)
SlotPoolconnectToResourceManager in interface SlotPoolconnectToResourceManager in interface SlotPoolServiceresourceManagerGateway - The RPC gateway for the resource manager.public void disconnectResourceManager()
SlotPoolThe slot pool will still be able to serve slots from its internal pool.
disconnectResourceManager in interface SlotPooldisconnectResourceManager in interface SlotPoolServicepublic void releaseSlot(@Nonnull SlotRequestId slotRequestId, @Nullable Throwable cause)
AllocatedSlotActionsSlotRequestId. Additionally, one can provide a cause
for the slot release.releaseSlot in interface AllocatedSlotActionsslotRequestId - identifying the slot to releasecause - of the slot release, null if nonepublic Optional<PhysicalSlot> allocateAvailableSlot(@Nonnull SlotRequestId slotRequestId, @Nonnull AllocationID allocationID, @Nonnull ResourceProfile requirementProfile)
SlotPoolIllegalStateException will be thrown.allocateAvailableSlot in interface SlotPoolslotRequestId - identifying the requested slotallocationID - the allocation id of the requested available slotrequirementProfile - resource profile of the requirement for which to allocate the slot@Nonnull public CompletableFuture<PhysicalSlot> requestNewAllocatedSlot(@Nonnull SlotRequestId slotRequestId, @Nonnull ResourceProfile resourceProfile, @Nullable org.apache.flink.api.common.time.Time timeout)
SlotPoolrequestNewAllocatedSlot in interface SlotPoolslotRequestId - identifying the requested slotresourceProfile - resource profile that specifies the resource requirements for the
requested slottimeout - timeout for the allocation procedure@Nonnull public CompletableFuture<PhysicalSlot> requestNewAllocatedBatchSlot(@Nonnull SlotRequestId slotRequestId, @Nonnull ResourceProfile resourceProfile)
SlotPoolrequestNewAllocatedBatchSlot in interface SlotPoolslotRequestId - identifying the requested slotresourceProfile - resource profile that specifies the resource requirements for the
requested batch slotpublic void disableBatchSlotRequestTimeoutCheck()
SlotPooldisableBatchSlotRequestTimeoutCheck in interface SlotPool@Nonnull public Collection<SlotInfoWithUtilization> getAvailableSlotsInformation()
SlotPoolSlotInfoWithUtilization objects about all slots that are currently
available in the slot pool.getAvailableSlotsInformation in interface SlotPoolSlotInfoWithUtilization objects about all slots that are currently
available in the slot pool.public Collection<SlotOffer> offerSlots(TaskManagerLocation taskManagerLocation, TaskManagerGateway taskManagerGateway, Collection<SlotOffer> offers)
SlotPoolSlotPool. The slot offerings can be individually
accepted or rejected by returning the collection of accepted slot offers.offerSlots in interface SlotPoolofferSlots in interface SlotPoolServicetaskManagerLocation - from which the slot offers originatetaskManagerGateway - to talk to the slot offereroffers - slot offers which are offered to the SlotPoolpublic Optional<ResourceID> failAllocation(AllocationID allocationID, Exception cause)
failAllocation in interface SlotPoolallocationID - Represents the allocation which should be failedcause - The cause of the failurepublic Optional<ResourceID> failAllocation(@Nullable ResourceID resourceId, AllocationID allocationID, Exception cause)
SlotPoolServicefailAllocation in interface SlotPoolServiceresourceId - taskManagerId is non-null if the signal comes from a TaskManager; if the
signal comes from the ResourceManager, then it is nullallocationID - allocationId identifies which allocation to failcause - cause why the allocation failedpublic boolean registerTaskManager(ResourceID resourceID)
registerTaskManager in interface SlotPoolregisterTaskManager in interface SlotPoolServiceresourceID - The id of the TaskManagerpublic boolean releaseTaskManager(ResourceID resourceId, Exception cause)
releaseTaskManager in interface SlotPoolreleaseTaskManager in interface SlotPoolServiceresourceId - The id of the TaskManagercause - for the releasing of the TaskManagerpublic AllocatedSlotReport createAllocatedSlotReport(ResourceID taskManagerId)
SlotPoolcreateAllocatedSlotReport in interface SlotPoolcreateAllocatedSlotReport in interface SlotPoolServicetaskManagerId - identifies the task manager@VisibleForTesting protected void timeoutPendingSlotRequest(SlotRequestId slotRequestId)
protected void checkIdleSlot()
protected void checkBatchSlotTimeout()
protected void runAsync(Runnable runnable)
runnable - Runnable to be executed in the main thread of the underlying RPC endpointprotected void scheduleRunAsync(Runnable runnable, org.apache.flink.api.common.time.Time delay)
runnable - Runnable to be executeddelay - The delay after which the runnable will be executedprotected void scheduleRunAsync(Runnable runnable, long delay, TimeUnit unit)
runnable - Runnable to be executeddelay - The delay after which the runnable will be executedCopyright © 2014–2021 The Apache Software Foundation. All rights reserved.